AO-A042  646 


UNCLASSIFIED 


ILLINOIS  UNIV  at  URBANA-CHAMFAI6N  COORDINATED  SCIENCE  LAB  F/6  9/2 

memory  organizations  and  Their  effectiveness  for  multi process in— etc (U) 

mat  77  F a BRIGgS  DAAB07-72-C-0259 


R-768 


nl 


1 OF  3 
^84  2646 

■ 

A 

— 

1 

1 

-1 

1 

1 

1 

1 

1 

1 

^ m t COORDINATED  SCIENCE  lABORATORY 


MEMORY  ORGANIZATIONS 
AND  THEIR  EFFECTIVENESS 
FOR  MULTIPROCESSING  COMPUTERS 

FAYE  ALAYE  BRIGGS 


D D C 


UNCLASSIFIED 


, security  classification  of  This  pace  flWiwi  Om«  Bmmrtd) 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 

1.  REPORT  number 

2.  GOVT  ACCESSION  NO. 

3.  RECIPIENT'S  CATALOG  NUMBER 

4.  Title  ('and  Sudtjrf*; 

5.  TYPE  OF  REPORT  6 PERIOD  COVERED 

MEMORY  ORGANIZATIONS  AND  THEIR  E 
; FOR  MULTIPROCESSING  COMPUTERS, 

FFECTIVENESS 

^ Technical  Repart^ 

" -*  - - 1 y 

/ /' 

f.  PERFORMII^O  PRO.  REPOStY  NUMBER 

R-768.  UILU-ENG-77-2215: 

7.  AUTMORCa; 

...  4: 

{j  Fay4  Alay4/Briggs 

r CfliUiRACT  ORJ?RAnT  N.UMHERf*) 

DAABJJ7-72-C-0259, 

P'-  MCS-  73-03488/ AOl  i — 

9.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

Coordinated  Science  Laboratory 
University  of  Illinois  at  Urbana-Champ^igh 
Urbana,  Illinois  61801 

10.  PROGRAI4  ELEMENT.  PROJECT.  TASK 
AREA  ft  WORK  UNIT  NUMBERS 

■ , 

11.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

Joint  Services  Electronics  Program 

12.  REPORT  DATE 

./  ' Mavdc^77' 

13.  NUMBER  OF  PAGE^ 

198 

14.  monitoring  agency  name  6 ADORESSflf  diftarani  from  Controlling  Olllco) 

15.  SECURITY  CLASS,  (ot  thia  eaport) 

UNCLASSIFIED 

15a.  DECLASSIFICATION/ DOWNGRADING 

schedule 

16.  DISTRIBUTION  STATEMENT  (ol  thil  RaporO 


Approved  for  public  release;  distribution  unlimited 

17.  DISTRIBUTION  STATEMENT  (el  the  ebelrael  entered  In  Block  20,  II  dlllereni  heat  Report) 


IS.  supplementary  notes 


IS.  KEY  WORDS  (Continue  on  reeeree  elde  II  neceeeory  end  Identlly  by  block  number) 

Memory  Organization 
Interleaved  Memories 
I'Ailtiporcessor  Memory 
Memory  Access  Conflict 


20.  abstract  (Continum  on  tfrf*  II  n«c««««ry  and  Idantity  by  block  numbmr) 

Organizations  of  interleaved  raultimodule  semiconductor  memories  are  studied 
to  facilitate  accessing  of  memory  words  by  parallel-pipelined  multiple  instruc- 
tion stream  processors.  All  memory  modules  are  assumed  to  be  identical  and 
are  characterized  by  the  address  cycle  (address  hold  time)  and  memory  cycle 
of  a and  c time  units  respectively,  A total  of  N (■  2")  memory  modules  are 
arranged  such  that  there  are  / (-  2°)  lines  for  addresses  and  m(-  2"“*’)  memory 
modules  per  line. 


DO  I 1473  «.tion  of  . nov  i*  obsolete  unclassified  0'/'  ' 0 f 

• ■ ' ■ •'  -J’  o » 1 3 |V  ^ security  CL  ASSIFICATION  OF  THIS  PAGE  flWiwi  Ene»r»BJ 


UNCLASSIFIED 

tICUWITY  CLAMiriCATlOW  9f  TMIl  OMa 


20.  ABSTRACT  (continued) 

For  a parallel-pipelined  processor  of  order  (s,  p),  which  consists 
of  p parallel  processors  each  of  vrfiich  is  a pipelined  processor  with  s 
degrees  of  multiprogramming,  there  can  be  up  to  s*p  memory  requests  in 
each  instruction  cycle.  The  memory  interference  problems  which  arise 
in  such  systems  are  investigated. 

Performance  is  evaluated  as  a function  of  the  memory  configuration 
il,  m) , the  module  characteristics  (a,c),  and  the  processor  order  (s,  p). 
Results  show  that  for  reasonably  large  values  of  N,  high  performance  can 
be  obtained  even  in  the  nonbuffered  case  when  ^ is  a»p  or  more.  Buffering 
has  its  maximum  effect  on  performance  when  x is  near  a*p.  When  i must 
be  greater  than  a«p  for  adequate  performance  in  the  nonbuffered  case, 
buffering  can  be  used  to  reduce  i while  maintaining  performance. 

Some  design  tradeoffs  were  discussed  and  examples  were  given  to  illus- 
trate the  wide  variety  of  design  options  that  can  be  obtained. 


UNCLASSIFIED 

MCumTV  CLAtWriCATlON  OP  TN)S  ^AOir*kM  Dmm 


<CC£SSION  lor 


I 

I 


KlIS  Woi|(  ^rr'loo 

OuC  Sut;  Secliao 

lir'i-l'-Cr-irO 

jUillrICiliaK  


n 

n 


f 

) 

( 

i 

I 

I 

I 


5Y  

oirTsijaTic!(/«y)H;s£iL!iY  cnctj 


"out,  m SfUIIU 

M T 


UILU-ENG  77-2215 


I 

MEMORY  ORGANIZATIONS  AND  THEIR 
EFFECTIVENESS  FOR  MULTIPROCESSING  COMPUTERS 

by 

Fay4  Alayl  Briggs 


f 

i 

r 


[ 

f 

T 

r 

r 

I 

I 


This  work  was  supported  in  part  by  the  Joint  Services 
Electronics  Program  (U.S.  Amy,  U.S.  Navy  and  U.S.  Air  Force) 
under  Contract  DAAB-07-72-C-0259  and  in  part  by  the  National 
Science  Foundation  under  Grant  MCS  73-03488  AOl. 


Reproduction  in  whole  or  in  part  is  permitted  for  any 
purpose  of  the  United  States  Government. 


Approved  for  public  release.  Distribution  unlimited. 


D D C 

,prr)^ppn  niP 

AUG  9 1977 


ulLTiDlSUITE 

D 


MEMORY  ORGANIZATIONS  AND  THEIR  EFFECTIVENESS 


FOR  MULTIPROCESSING  COMPUTERS 


FAY^  ALAY^  BRIGGS 

B.  Eng.,  Ahmadu  Bello  University,  1971 
M.S.,  Stanford  University,  197^ 


THESIS 


Submitted  in  partial  fulfillment  of  the  requirements 
for  the  degree  of  Doctor  of  Philosophy  in  Electrical  Engineering 
In  the  Graduate  College  of  the 
University  of  Illinois  at  Urbana-Champaign,  1977 


Thesis  Adviser:  Professor  E.  S.  Davidson 


Urbana,  Illinois 


MEMORY  ORGANIZATIONS  AND  THEIR  EFFECTIVENESS 


FOR  MULTIPROCESSING  COMPUTERS 

Faye  Alaye  Briggs,  Ph.D. 

Coordinated  Science  Laboratory  and 
Department  of  Electrical  Engineering 
University  of  Illinois  at  Urbana-Champaign,  1977 

Organizations  of  interleaved  multimodule  semiconductor  memories  are 
studied  to  facilitate  accessing  of  memory  words  by  paral lel-pipel ined 
multiple  instruction  stream  processors.  All  memory  modules  are  assumed 
to  be  identical  and  are  characterized  by  the  address  cycle  (address  hold 
time)  and  memory  cycle  of  a and  c time  units  respectively.  A total  of 
N (■  2'’)  memory  modules  are  arranged  such  that  there  are  £.(*  2*’)  lines 
for  addresses  and  m("  2^  memory  modules  per  line. 

For  a parallel-pipelined  processor  of  order  (s,  p) , which  consists 
of  p parallel  processors  each  of  which  is  a pipelined  processor  with  s 
degrees  of  multiprogramming,  there  can  be  up  to  s*p  memory  requests  in 
each  instruction  cycle.  The  memory  interference  problems  which  arise  in 
such  systems  are  investigated. 

Performance  is  evaluated  as  a function  of  the  memory  configuration 
(2,  m) , the  module  characteristics  (a,  c) , and  the  processor  order  (s,  p) . 
Results  show  that  for  reasonably  large  values  of  N,  high  performance  can 
be  obtained  even  in  the  nonbuffered  case  when  2 is  a*p  or  more.  Buffering 
has  its  maximum  effect  on  performance  when  2 Is  near  a*p.  When  2 must 


be  greater  than  a*p  for  adequate  performance  In  the  nonbuffered  case, 
buffering  can  be  used  to  reduce  I while  maintaining  performance. 


Some  design  tradeoffs  were  discussed  and  examples  were  given  to 
illustrate  the  wide  variety  of  design  options  that  can  be  obtained. 

I 

I 

I 


I 


( 

( 

r 

f 

I 


f 

r 


in 


ACKNOLWEDGEMENTS 

The  author  wishes  to  express  his  gratitude  to  his  advisor. 

Dr.  Edward  Davidson,  for  his  guidance,  encouragement  and  helpful  suggestions 
throughout  this  thesis.  Dr.  Davidson's  many  hours  spent  in  reading  and 
refining  this  thesis  are  appreciated. 

The  author  is  also  grateful  to  his  collegues  at  the  Coordinated 
Science  Laboratory  for  creating  a vivifying  intellectual  atmosphere. 


I 

I 

( 

] 

tr 

i 

[ 


- i V 


TABLE  OF  CONTENTS 


I 


( 

I 

I 

I 

f 

f 

I 

I 


Page 


1.  BACKGROUND  AND  MOTIVATION 1 

1.1.  I ntroduction 1 

1.2.  Processor  Organization ^ 

1.3.  Timing  Characteristics  of  Memory  Module 11 

1.4.  Some  Previous  Models 22 

1.5*  Problem  Statement 25 

1.6.  Overview  of  Dissertation 25 

2.  THE  L-M  MEMORY  ORGANIZATION 27 

2.1.  Introduction 27 

2.2.  Memory  Configuration 28 

2.3.  Memory  Request  Scheduling 35 

2.4.  Processor-Memory  Interconnection 37 

3.  PERFORMANCE  ANALYSIS  OF  L-M  MEMORY  ORGANIZATION 40 

3.1.  introduction 40 

3.2.  State  Diagrams  for  p « 1 42 

3.3.  State  Reduction  and  Line  Decomposition 70 

3.4.  Line  State  Space 86 

3.5.  Probability  of  Acceptance,  P/\(a,  c,  p) 32 

3.6.  Bounds  on  P^(a,  c,  p) Ill 

4.  SIMULATION  OF  BUFFERED  AND  NONBUFFERED  REQUESTS 117 

4.1.  Introduction 117 

4.2.  Nonbuffered  Request  Processor  System 123 

4.3.  Buffered  Request  Processor  System 125 

4.4.  Discussion 128 

5.  ANALYSIS  OF  RESULTS 133 

5.1.  Introduction 133 

5.2.  Effect  of  Number  of  Modules  (N)  on  Performance 13^ 

5.3.  Effect  of  the  Number  of  Lines  on  Performance 139 

5.4.  Effect  of  Module  Characteristics  on  Performance 142 

5.5.  Effect  of  Processor  Order  on  Performance 147 

5.6.  Effect  of  Processor  Speed  on  Performance 150 

5.7.  Effect  of  Buffering  on  Performance 156 

5.8.  Design  Tradeoffs 165 

5.9.  Burst  Mode  Operation I8O 


- V “ 

6.  CONCLUSIONS '87 

6.1  Summary  of  Results '87 

6.2  Suggestions  for  Further  Research '89 

APPENDIX  A '9' 

LIST  OF  REFERENCES '9^ 


1 


I.  BACKGROUND  AND  MOTIVATION 

1 . 1 Introduction 

In  the  quest  for  higher  performance  in  computer  systems,  two 
architectural  techniques,  namely,  paral 1 el i sm  and  pipelining,  evolved 
to  enhance  the  computation  capability  of  the  systems.  In  addition  to 
the  architectural  alternatives,  higher  performance  may  also  be  achieved 
by  increasing  the  switching  speed  of  the  electronic  components.  These 
three  methods  of  improving  the  performance  are  not  necessarily  mutually 
exclusive.  Although  performance  may  be  improved  by  the  above  techniques 
it  may  be  degraded  considerably  if  the  memory  system  is  organized  in- 
efficiently and  does  not  match  the  processor  system  in  speed.  Further- 
more, a very  efficient  memory  organization  for  multiprocessor  systems 
may  be  cost  prohibitive.  These  factors  have  prompted  extensive  investi- 
gation into  techniques  for  organizing  memories  for  multiprocessor 
systems. 

In  some  highly  parallel  processor  systems,  concurrency  is  achieved 
by  the  multiplicity  of  independent  processing  units  which  execute  sep- 
arate instruction  streams  on  separate  data  streams  [1].  However,  there 
exist  other  highly  parallel  computer  systems,  like  ILLIAC  IV,  which  are 
characteristically  array  processors  that  perform  the  same  computation 
on  a large  collection  of  related  data  elements  simultaneously  [2].  In 
this  research,  parallel  processors  will  refer  to  the  former  organization 

Parallelism  or  concurrency  of  instructions  and  data  transfers  also 
occurs  in  pipelined  computers  which  have  become  common  of  recent. 


2 


Pipelining  is  a technique  of  decomposing  a sequential  process  into  a 
sequence  of  computation  steps,  each  of  which  can  be  processed  in  a spec 
ial  functionally  dedicated  and  autonomous  unit,  called  a segment  which 
operates  concurrently  with  other  segments.  Hence  a pipelined  processor 
is  composed  of  segments  which  are  arranged  so  that  consecutive  steps 
of  an  instruction  can  be  assigned  to  distinct  segments  of  the  pipeline 
for  processing. 

One  form  of  pipelining  occurs  in  the  highly  partitioned  and  over- 
lapped instruction  execution  technique  implemented  in  the  IBM  360/91, 
which  achieves  a high  efficiency  by  the  concurrency  of  instructions 
and  data  transfer  [3].  Another  form  of  pipelining  is  the  "stream"  pipe 
lining  which  performs  the  same  arithmetic  operation  on  a series  of 
operands  as  they  flow  through  the  pipe.  Examples  of  these  are  the  CDC 
STAR- 100  [i*],  and  Tl-ASC  [5]. 

Such  highly  concurrent  processors  will  be  characterized  and  a de- 
scription of  the  general  model  of  the  processor  organization  will  be 
presented  in  the  next  section. 

In  a multiprocessor  environment,  main  memory  is  a prime  system 
resource  which  is  usually  shared  by  all  the  processors.  Hence  care 
must  be  taken  in  the  organization  of  the  memory  system  to  avoid  severe 
P'  performance  degradation  due  to  memory  interference  caused  by  two  or 

more  processors  simultaneously  attempting  to  access  the  same  module 
of  the  memory  system. 

It  would  be  undesirable  to  have  one  monolithic  unit  of  memory  to 
be  shared  among  several  processors,  as  this  would  result  in  serious 
memory  interference,  hence  the  memory  is  partitioned  into  several 


- 3 - 


independent  memory  modules.  This  scheme  resolves  interference  by 
allowing  simultaneous  access  to  more  than  one  module  but  considerable 
interference  can  still  result  if  the  memory  addresses  are  contiguous 
within  a module.  Interleaving  of  memory  modules  is  used  to  alleviate 
this  problem. 

In  most  highly  concurrent  computer  systems,  interleaving  of  memory 
modules  is  also  required  in  order  to  obtain  a balance  between  effective 
processor  and  memory  cycles.  For  example,  the  CDC  STAR-100  has  a 
processor  cycle  of  40ns  and  a memory  cycle  of  1280ns  [4].  At  most, 
one  memory  reference  can  be  made  in  one  processor  cycle.  For  such  fast 
processors,  the  rate  at  which  data  can  be  transferred  between  processors 
and  main  memory  is  often  limited  by  the  transfer  capabilities  of  the 
memory  itself  and  the  memory  busses.  Hence  the  memory  is  usually  organ- 
ized to  meet  the  memory  bandwidth  requirements  of  the  system.  The 
memory  bandwidth  is  the  rate  at  which  memory  can  transfer  information, 
usually  represented  in  words  per  second. 

On  the  other  hand,  some  processors,  such  as  microprocessors,  exhibit 
processor  cycles  which  are  usually  slower  than  the  memory  cycles  of  some 
memories.  In  such  cases,  the  memories  are  usually  underutilized,  unless 
the  processors  are  organized  to  create  a balance  between  the  processor 
and  memory  cycles. 

In  the  past,  magnetic  memories  have  been  used  in  the  main  memory 
systems  of  multiprocessor  systems.  However,  with  the  advent  of  large 
scale  Integrated  circuits,  semiconductor  memories  have  been  playing  in- 
creasingly Important  roles  in  the  synthesis  of  main  memory  systems. 

Their  inherent  nxidularity  makes  them  very  appealing  in  the  design  of 


- k - 


multimodule  memory  systems  for  multiprocessor  systems.  In  addition  to 
their  flexibility  and  nondestructive  readout  capability,  the  cost  per 
bit  of  semiconductor  memories  remain  virtually  constant  as  the  module 
size  increases  or  decreases  for  a wide  range  of  module  sizes  [6].  In 
contrast,  the  cost  per  bit  of  magnetic  memories,  such  as  ferrite  core, 
rises  very  rapidly  for  decreasing  module  size.  Furthermore,  some  cur- 
rent semiconductor  memories  exhibit  timing  characteristics  which  may  be 
exploited  to  enhance  the  data  transfer  capabilities  of  multiple  in- 
struction stream  computer  systems.  These  timing  characteristics  will 
be  discussed  in  section  1.3* 

Hence  in  this  research,  we  describe  a method  for  exploiting  the 
capabilities  of  semiconductor  memories  to  obtain  an  effective  multi- 
module memory  organization  for  paral lei -pi pel i ned  multiple  instruction 
stream  processors.  Furthermore,  the  memory  interference  problem  in 
such  processor-memory  systems  are  investigated. 

1.2  Processor  Organization 

The  concept  of  multiprocessors  has  been  introduced  in  section  l.l. 

The  processor  organization  to  be  discussed  in  this  section  is  a theo- 
retical model  chosen  to  include  a broad  class  of  multiple  Instruction 
stream  multiple  data  stream  (MIMD)  processors  [1].  A formal  definition 
of  the  pipelined  processor  model  adopted  here  is  given  below. 

Definition  1.2.1  A pipelined  processor  of  order  s is  modeled  as  an 
ordered  set  of  ® segments  (Sq,  Sj,  Sg-P'  which  can  simul- 

taneously be  processing  a distinct  step  or  phase  of  a distinct  instruction. 

□ 


- 5 - 


f 

r 


Once  an  instruction  is  initiated  in  a segment,  it  flows  from  seg- 
ment to  segment  for  its  execution,  where  each  segment  performs  a specific 
suboperation  on  a distinct  phase  of  the  instruction.  It  is  considered 
that  each  segment  has  an  output  latch  or  register  to  help  retain  its 
autonomy.  Figure  1.2.1  shows  a nonpipelined  processor  as  one  monolithic 
unit,  and  Figure  1.2.2  illustrates  a pipelined  processor  of  order  3- 
The  pipelined  processor  defined  above  can  be  implemented  in  two 
different  ways,  namely,  as  a single  instruction  stream  and  multiple  in- 
struction stream  pipelined  processors.  The  following  definition  will 
be  of  aid  in  understanding  the  two  different  implementations. 


Def i n i t ion  1.2.2  The  rth  process  or  instruction  stream,  l(r)  is  a 

sequence  of  instructions  that  require  execution.  Thus, 


where  a..  ■ ith  instruction  from  the  jth  Instruction  stream. 

u ■’ 


Figure  1.2.3  is  a space-time  illustration  of  one  form  of  pipelining 
processor  of  order  6.  In  this  scheme,  execution  of  Instructions  from 
the  same  stream  are  overlapped.  The  problems  usually  associated  with 
the  single  instruction  stream  pipelined  processors  are  the  performance 
degradation  and  control  problems  due  to  data  dependencies  and  branch 
I nstructions. 

In  this  research,  multiple  instruction  stream  pipelined  processor 
organ izat ions  were  adopted,  since  performance  degradation  and  control 
problems  due  to  data  dependencies  and  branch  instructions  are  absent. 


- 6 - 


Figure  1.2.1  A nonpipelined  processor  as  a mono- 
lithic unit 


- 7 - 

Concurrency  in  such  processors  is  achieved  only  between  distinct  in- 
struction streams  as  illustrated  in  Figure  1.2.4.  In  this  scheme  each 
instruction  is  partitioned  into  s distinct  phases  and  each  distinct  phase 
is  sequentially  assigned  to  each  distinct  segment.  In  general,  s dis- 
tinct processes  are  in  execution  concurrently  and  if  an  instruction  from 
a process  is  initiated  at  time  instant  t,  the  next  instruction  from  the 
same  process  will  be  Initiated  at  time  instant  t + s.  Hence  there  is 
no  execution  overlap  between  instructions  from  the  same  stream. 

Notice  in  Figure  1.2.4  that  all  instructions  have  identical  flow 
patterns.  Pipelines  in  which  all  Instructions  have  Identical  flow 
patterns  are  termed  single  function  pipelines.  On  the  other  hand,  in  a 
multifunction  pipeline,  there  are  two  or  more  distinct  flow  patterns 
and  each  instruction  may  use  one  of  these  flow  patterns  [?]•  In  this 
research  it  is  assumed  that  the  pipeline  processor  is  a single  function 
pi  pel ine. 

In  a pipelined  processor  of  order  s,  s separate  instructions  will 
be  in  different  phases  of  their  execution  steps.  These  s instructions 
are  assumed  to  come  from  distinct  instruction  streams  as  in  [8].  Thus 
the  degree  of  multiprogramming,  for  a pipelined  processor  of  order  s, 
is  also  s. 

The  pipelined  processor  can  be  partitioned  so  that  each  segment 
takes  the  same  time  to  complete  its  execution  step. 

Def i ni t ion  1.2.3  One  segment  time  unit  (STU),  is  the  time,  in  seconds 

required  by  each  segment  to  execute  each  distinct  phase  of  an  instruction. 

□ 


- 8 - 


Segment 


^5 

“11 

“21 

“31 

“41 

“51 

“61 

“71 

B 

■ 

B 

fl 

“11 

“21 

“31 

“41 

“51 

“61 

=3 

■ 

1 

■ 

“ll 

“21 

*^31 

“41 

“51 

“61 

“71 

n 

^2 

1 

■ 

^‘ii 

“21 

“31 

“41 

“51 

“61 

“71 

1 

^1 

■ 

®11 

^^21 

“31 

“41 

“51 

“61 

“71 

1 

■ 

■ 

^0 

«11 

“21 

<=^31 

“41 

“51 

“61 

“71 

■ 

■ 

■ 

01  23^567  89  10  11  12 

-►Time 

Figure  1.2.3  Single  Instruction  stream  processing  In  a pipelined 


processor  of  order  6. 


t 


I 


4 


Segment 


=5 

“11 

“12 

“13 

“14 

“15 

“16 

i 

■ 

■ 

■ 

“11 

“12 

®13 

“l4 

“15 

“16 

“21 

■ 

^3 

■ 

■ 

“11 

“12 

^*13 

“l4 

“15 

“16 

“21 

■ 

■ 

■ 

“11 

“12 

“13 

“l4 

“15 

“16 

a 

■ 

■ 

■ 

^1 

“11 

“12 

“13 

“14 

“15 

“16 

“21 

m 

■ 

■ 

^0 

“11 

“12 

“13 

“14 

“15 

“16 

“21 

B 

m 

■ 

■ 

0 123^56789  10  11  12 

-►Time 

Figure  1.2.4  Multiple  instruction  stream  processing  In  a pipelined 
processor  of  order  6. 


- 


- 9 - 


Hence  if  the  phases  of  an  instruction  are  partitioned  so  that  it 
takes  T seconds  to  execute  each  phase  of  the  instruction,  then  one  seg- 
ment time  unit  is  equal  to  T seconds. 

Assume  also  that  a pipelined  processor  of  order  s can  issue  one 
memory  request  per  STU , hence,  s memory  requests  can  be  issued  in  one 
instruction  cycle;  where  one  instruction  cycle  “ s«T  seconds.  Notice 
that  the  instruction  cycle  is  fixed  irrespective  of  the  instruction. 

Hence  a pipelined  processor  is  characterized  by  s and  x,  the  degree  of 
pipelining  and  the  segment  time  unit  respectively.  From  now  on,  all 
time  units  will  be  expressed  as  an  integer  number  of  STUs,  unless  other- 
wise stated. 

A reservation  table  [9],  used  to  Illustrate  the  flow  of  computation 
through  the  segments  of  a pipeline,  is  shown  in  Figure  1.2.5  for  a 
straight-through  pipelined  processor  of  order  6. 

Following  initiation  of  an  instruction  process  at  time  instant  t, 
an  X in  cell  (u,  v)  indicates  that  a task  requires  the  segment  associated 
with  row  ^ for  segment  time  interval  <t  + v,  t + v + 1>.  Note  that  the 
reservation  table  for  the  examples  of  single  and  multiple  instruction 
stream  pipelined  processors,  illustrated  in  Figure  1.2.3  and  1.2.4  re- 
spectively, would  be  identical  to  that  shown  in  Figure  1.2.5. 

The  generalized  processor  organization  will  now  be  discussed. 

Definition  1.2.4  A parallel  pipelined  processor  of  order  (s,p)[IO] 
is  modeled  as  a set  of  p Identical  and  independent,  but  synchronized 
processors,  each  of  which  is  a pipelined  processor  of  order  s.  ^ 


10  - 


o -K  a.  ^ 


t t>l  \*Z  t4-3  t*4  t + 5 t + 6 

Time  Instants  (STU) 

r^-5t45 


Figure  1.2.5  Reservation  tabie  of  a straight-through  pipeiined  processor 
of  order  6 


n 


Ftgure  1.2.6  illustrates  various  configurations  of  parallel- 
pipelined  processors. 

A paral le 1 -pi pel i ned  processor  is  thus  completely  specified  by  the 
degree  of  pipelining,  s,  the  parallelism,  p,  and  the  segment  time,  x. 

A parallel-pipelined  processor  of  order  (s,  p)  can  issue  p simultaneous 
memory  requests  each  STU.  Hence,  it  can  execute  s-p  distinct  instruc- 
tion streams  concurrently. 

The  processor  bandwidth,  B^,  defined  as  the  number  of  memory  re- 
quests that  can  be  issued  per  second,  will  be  adopted  as  a performance 
measurement  of  the  parallel-pipelined  processor  of  order  (s,  p) . Hence, 

B ■ -E-.  Note  that  s Is  not  explicitly  Involved  in  the  definition  of  B . 

1.3  Timing  Characteristics  of  ^temo^y  Module 

In  this  section,  the  timing  characteristics  of  the  memory  module  are 
discussed  to  highlight  the  features  which  are  exploited  in  the  organiza- 
tion of  the  memory  modules  for  parallel-pipelined  processors. 

In  section  1.1,  the  inherent  modularity  and  cost-effectiveness  of 
semiconductor  memories  were  pointed  out.  The  cost-effectiveness  of 
semiconductor  memories  is  even  much  better  over  core  memories  for  small 
module  sizes.  These  factors  account  for  the  increasing  use  of  LSI  mem- 
ories, instead  of  core  memories  for  implementing  a flexible  mtltimodule 
memory  organization.  Hence  the  LSI  memory  chip  is  discussed  to  high- 
light the  block  structures. 


12  - 


Order:  (1,1) 


1 


Nonpipeiined 

Processor 


(s,l) 


1 

2 

Pipelined  Processor 

Order:  (l,p) 


1 


(s.p) 


Parallel  Nonpipeiined 
Processor 


Parallel  Pipelined  Processor 


Figure  1.2.6  Configurations  of  parallel-pipelined  processors 


13  - 


The  LSI  memory  chips  are  assumed  to  be  fully  decoded,  fixed  address 
random  access  memory  [6].  Typically,  read  only  memory  chips  consist 
of  four  functional  blocks,  as  shown  in  Figure  1.3.1.  They  include  an 
address  register  to  latch  in  the  address,  an  address  decoder  to  select 
the  addressed  word,  the  ROM  matrix  and  an  output  data  buffer  stage. 

In  some  memory  chips,  the  address  register  and  the  output  data  buffer 
are  not  fabricated  on  the  memory  chip.  However,  with  recent  develop- 
ments in  LSI  technology,  the  cost  of  adding  these  buffers  on  the  memory 
chip  at  fabrication  is  insignificant.  The  corresponding  simplified 
timing  diagram  for  a ROM  chip  is  shown  in  Figure  1.3.2  and  time  units 
are  represented  in  seconds.  Assume  for  simplicity  that  the  chip  select 
and  address  signals  are  gated  into  the  chip  simul taneogsly.  Although 
in  practice,  the  chip  select  signal  may  precede  or  follow  the  address 
signal  depending  on  the  timing  characteristics  of  the  chip.  The  out- 
put data  is  usually  valid  and  available  after  the  memory  access  time, 
t^^,  of  the  chip.  The  duration  of  the  output  data,  t^^, depends  on 
whether  the  storage  elements  of  the  chip  are  static  or  dynamic.  In 
practice,  t^^  is  controlled  as  to  keep  nonvalid  output  off  the  data  bus. 

In  the  read  and  write  RAM  chip,  there  are  typically  five  principal 
functional  blocks  as  shown  in  Figure  1.3.3.  These  Include  the  address 
register  and  decoder,  the  storage  cells  and  the  input  and  output  data 
buffers.  A typical  timing  diagram  for  the  read  cycle  of  the  RAM  chip 
is  Identical  to  the  ROM's  timing  diagram  in  Figure  1.3.2.  However, 
Figure  1.3.^  shows  the  timing  diagram  for  the  write  cycle  of  the 
RAM  chip. 


14 


Figure  1.3.1 


Functional  blocks  of  Read  Only  nneiiiory  chip 


Write 

Enable 


Figure  1.3.^  Timing  diagram  for  write  cycle  of  RAM  chip 


- 18  - 

Let  t and  t . be  the  chip  select  pulse  width  and  the  address 
pulse  width  respectively.  Furthermore,  t^.  and  t^  denote  the  data 
input  pulse  width  and  the  write  enable  pulse  width  respectively.  In 
simple  terms,  a memory  operation  is  initiated  by  broadcasting  an  ad- 
dress to  the  address  bus  for  t^^  seconds,  to  the  address  register  of 
the  memory  chip,  whereupon  the  address  is  further  decoded  in  the  decoder 
block,  to  select  the  corresponding  memory  bit.  The  selected  memory  bit 
contents  are  then  altered  as  specified  by  the  read/write  request  function. 

Definition  1.3.1  The  memory  cycle,  t^,  is  the  time  that 
a memory  chip  remains  busy  after  a memory  operation  is  initiated.  For 
read  memory  operation,  this  the  minimum  duration  between  successive  read 
requests  and  is  called  the  read  memory  cycle,  t^^.  Similarly,  for 
write  operation,  this  is  called  the  write  metrory  cycle,  t. . □ 

Definition  1.3.2  The  address  cycle,  t_,  is  the  minimum  duration  that 
the  address  can  be  maintained  on  the  address  bus  of  the  memory  chip  for 
a successful  memory  operation.  □ 

Hence  in  Figures  1.3.2  and  1.3.^.  the  address  cycle,  t^  » t^^. 

The  end  of  the  input  data  pulse  need  not  coincide  with  the  end  of  the 
write  memory  cycle  shown  in  Figure  1.3*^.  Usually,  the  Input  data  can 
be  presented  to  the  data  bus  and  gated  by  the  write  enable  pulse  after 
the  memory  access  time,  t . The  chip  select  pulse  width  and  the 
address  cycle  for  the  write  operation  are  usually  identical  to  those  of 
the  read  operation.  However,  for  some  memory  chips  they  need  not  be  so. 


- 19  - 


The  timing  constraints  are  generally  obvious  from  the  simplified 

timing  diagram  for  the  ROM  and  RAM  chips.  However,  in  general, 

t < t for  RAM  chips.  Furthermore,  for  any  memory  chip,  t < t . 
rc  — wc  a — c 

Since  the  read  memory  cycle,  t^^  may  not  always  equal  the  write  memory 
cycle,  t^^,  we  can  for  analytical  purposes,  introduce  an  effective 
memory  cycle  which  takes  into  consideration  the  distribution  of  read 
and  write  memory  requests.  For  simplicity  assume  that  the  proportion 
of  read  and  write  requests  are  f^  and  f^  respectively.  Then, 
f + f - 1 . 


r w 

Hence  the  effective  memory  cycle  of  the  RAM  chip  is 

t “ft  + f t 
ec  r rc  w wc 

Similarly,  if  the  address  cycles  for  read  and  write  memory  operations 
differ,  the  effective  address  cycle  can  be  obtained. 


Definition  1.3.3  The  memory  bandwi dth  of  a RAM  chip  is 
ec 

which  is  the  request  servicing  capacity  of  the  chip.  q 

Typically,  each  semiconductor  memory  chip  realizes  some  z « 2 
words  of  1 bit  each.  A memory  module  of  z words  of  some  w bits  each 
is  then  usually  organized  by  interconnecting  w chips  in  a one  dimensional 
array,  as  In  Figure  1.3.5*  so  that,  amongst  other  things,  corresponding 
chip  select  leads,  write  enable  signals,  and  addresses  are  common. 

In  effect,  a memory  module  has  three  separate  and  inaependent  busses, 
namely,  the  address  bus,  the  data-input  bus,  and  the  data-output  bus. 


20  - 


^4 


21 


1 


Since  a memory  organization  for  highly  parallel  processors  is  investi- 
gated here,  only  maximally  parallel  bus  structure  [11]  will  be  considered 
for  the  memory  system.  In  such  bus  structures,  simultaneous  operations 
can  be  in  progress  on  the  three  separate  and  independent  busses. 

Since  all  the  chips,  which  are  assumed  to  be  identical,  act  simul- 
taneously in  a memory  module,  the  memory  bandwidth  of  a module  is  identi- 
cal to  the  memory  bandwidth  of  a chip  in  the  module. 


Definition  1.3. A The  memory  bandwidtK  B^.  of  a fixed  address  RAM  module 
i s 

B - B - , 

«=  ^c 

which  indicates  the  request  servicing  capacity  of  the  module.  _ 


For  RAM  memory  chips  and  hence  modules,  the  address  is  held  on  the 

address  bus  at  least  as  long  as  data  is  held  on  the  data  bus.  That  is, 

0 < < t j and  0 < t ..  < t . . 

do  — ad  di  _ ad 

Hence  the  data  busses  do  not  pose  a limiting  constraint  on  how  often 
addresses  can  be  dispatched  to  the  address  bus  and  are  therefore  not 
considered  here  explicitly. 

In  summary,  a memory  module  is  characterized  by  its  address  and 

memory  cycles,  (t  . t_).  They  are  referred  to  as  the  absolute  memory 
3 C 

module  characteristics,  because  the  cycles,  t_  and  t_,  are  expressed 
In  seconds.  However,  the  address  and  memory  cycles  can  be  quantized 
such  that  they  are  both  expressed  as  integer  numbers  of  STUs,  namely. 


~~T:^ 


22 


a - — 

T 

Hence  the  relative  module  characteristics  are  (a,  c) . For  example, 

i f T *•  50ns,  t * l40ns  and  t = 240ns,  the  relative  module  character- 
d c 

istlcs  are  (3»  5)-  Since  in  general,  t < t , then  a<  c.  Henceforth, 

d C ““ 

the  relative  module  characteristics  will  be  referred  to  simply  as  the 
module  characteristics  unless  otherwise  stated. 

1.4  Some  Previous  Models 

Various  analytic  and  simulation  models  have  been  developed  to 
evaluate  the  performance  of  a multiprocessor  computer  system.  In 
multiprocessor  systems,  several  tasks  are  executed  concurrently.  These 
tasks  may  generate  requests  to  memory  simultaneously.  Multiple  re- 
quests sometimes  result  in  memory  conflicts  despite  the  interleaving 
of  the  memory  modules. 

All  models  presented  here  assume  some  form  of  memory  interleaving 
and  evaluate  the  access  conflict  problem. 

Hellerman  [12]  presented  a model  in  which  a single  stream  of 
intermixed  independent  instructions  and  data  requests  was  scanned  in 
the  order  of  their  arrival  until  the  first  duplicate  memory  module 
number  was  found.  These  first  K distinct  requests  are  then  granted 
in  parallel.  For  N interleaved  memory  modules,  the  average  memory 
bandwidth  for  Hellerman 's  model  Is 

B.  " . 

K-1  N^(N-K)! 

He  found  that  a good  numerical  approximation  of  the  above  expression 


and  c 


T ’ 


where  x is  the  segment  time  unit  in  seconds, 


- 23  - 


when  1 ^ N ^ ^5  is  >»  /N,  accurate  to  within  k%. 

Knuth  and  Rao  [13]  reduced  Hellerman's  expression  into  a closed 

form. 


Burnett  and  Coffman  [11*,  I5]  extended  Hellerman's  model  by  sep- 
arating the  Instruction  requests  from  the  data  requests  and  showed  that 
the  system  bandwidth  can  be  Increased  considerably  because  of  the  lo- 
cality in  programs  due  to  the  sequential  nature  of  instructions.  This 
was  modeled  by  introducing  two  parameters,  namely,  a and  6,  where  a 
is  the  probability  of  a request  addressing  the  next  module  in  se- 
quence (modulo  N)  and  8 * (l-a)/(N-l),  is  the  probability  of  addressing 
•ach  other  module  out  of  sequence.  It  was  found  analytically  that  the 
bandwidth  increases  exponentially  with  increase  in  a. 

The  above  models  seem  to  assume  a single  processor  with  in- 
struction look-ahead  capabilities,  hence  they  will  be  called  overlap- 
processor  models.  One  of  the  first  few  analytic  models  for  a multi- 
processor system  was  proposed  by  Skinner  and  Asher  [l6j.  The  analysis 
made  use  of  a discrete  Markov  chain  model  and  was  limited  to  a small 
number  of  processors  (^2)  because  of  the  complexity  involved  for 
larger  systems. 

Strecker  [17]  investigated  the  conflict  probiem  in  a multiprocessor 
system  with  p processors  and  N meirxjry  modules  in  which  the  processor 
CYC  1 e , that  Is,  the  segment  time  unit,  is  equal  to  the  pulse  width  of 
the  output  data  in  a read  memory  cycle.  By  approximate  analysis,  a 
closed  form  representation  of  the  bandwidth  was  obtained  as 

N[1-(1-|)P]  . 


- 2i*  - 


Ravi  [18]  has  a similar  model  which  he  analyzed  using  comb ina tor ia 1 s 
to  arrive  at  the  bandwidth  as 

t K-K!  S(p,K)(!J) 

z L.  , 

K=1 

where  t = min  (p,  N)  and  S(p,  K)  are  Stirling  numbers  of  the  second 
kind.  It  is  interesting  to  note  that  Strecker's  formula  is  a closed  form 
representation  of  Ravi's  formula.  Bhandarkar  [19  ] expanded  on  Strecker's 
results.  Sastry  and  Kain  pO]  had  similar  models  but  also  investiga- 
ted the  performance  using  different  storage  allocations  for  instructions 
and  data  with  interleaving  only  in  the  instruction  space.  Baskett  and 
Smith  [21]  have  also  investigated  the  memory  conflict  problem  in  multi- 
processor systems. 

One  common  characteristic  of  the  multiprocessor  models  discussed 
above  with  respect  to  the  parallel-pipelined  processor  proposed  here  is 
that  the  multiprocessor  systems  are  all  parallel-pipelined  processors 
of  order  (1,  p)  having  access  to  an  N module  memory  system  with  module 
characteristics  (a,  c)  ■ (1,  1).  In  this  research,  the  memory  conflict 
problem  and  hence  the  bandwidth  is  investigated  for  more  general  multi- 
processor models,  namely,  systems  encompassing  a wide  variety  of  para- 
l lel-pipel  ined  processors  of  order  (s,  p) , il,  m)  interleaved  memory 
configurations  and  memory  module  characteristics  (a,  c) . The  para- 
meters s,  p,  I,  m,  a,  c are  chosen  such  that  l^a^c,  l^c^s, 
p 2,  1 t 3nd  I and  m are  nonnegative  Integer  powers  of  2 and  Urn  ■ 2^  ■ N. 


- 25  - 


I 


1.5  Problem  Statement 

The  purpose  of  this  research  is: 

(i)  To  investigate  the  effect  of  memory  interference  on  system 
performance  for  a variety  of  relative  module  characteris- 
tics (a,  c)  and  memory  configurations  {H,  m) . 

1 

(ii)  To  study  the  effect  of  buffering  memory  requests  when 
conf 1 i ct  occurs . 

(ili)  To  give  some  methodology  and  design  tradeoffs  for  a cost 
effective  memory  configuration. 

(iv)  To  obtain  guidelines  for  desirable  semiconductor  memory 
organization  for  paral 1 el -pi pel i ned  multiple  instruction 
stream  processors. 

1.6  Overview  of  Dissertation 

So  far,  the  background  materials  that  motivated  this  research  has 
been  presented.  In  Chapter  2,  the  memory  organization  is  developed.  ' 

The  module  access  conditions  are  also  outlined.  In  Chapter  3,  the  per- 
formance of  the  memory  organization  is  investigated  for  the  parallel- 
pipelined  processor  of  order  (s,  p).  Discrete  Markov  models  are  developed 
to  aid  in  the  performance  analysis  of  the  case  where  p « 1.  State  re- 
duction and  line  decomposition  techniques  are  introduced  to  reduce  the 
complexity  and  size  of  the  state  diagrams  developed  for  p ■ 1.  In 
this  chapter,  it  was  also  shown  that  the  complexity  of  the  performance 
analysis  grows  with  ^ for  a > 1.  Using  the  line  state  transition 

3 


- 26  - 


diagram  for  p = 1,  closed  form  solutions  of  the  probability  of 
acceptance  for  p > I are  obtained  for  a class  of  module  characteris- 
tics. Furthermore,  bounds  on  the  performance  are  al so  obtained  for 
any  set  of  module  characteristics.  In  Chapter  A,  simulation  models 
are  developed  for  two  different  processor  schemes,  namely,  non- 
buffered  and  buffered  request  processor  systems.  In  particular, 
these  schemes  highlight  two  possible  ways  of  handling  requests  when 
conflict  occurs.  In  Chapter  5,  the  effects  of  the  various  parameters 
on  performance  are  investigated.  In  addition,  some  design  trade- 
offs are  studied.  The  burst  mode  of  operation,  which  dispatches 
simultaneously  all  requests  issued  in  one  memory  cycle  at  the  end 
of  the  cycle,  is  compared  to  the  multiplex  mode  of  operation,  which 
dispatches  requests  as  they  are  issued,  that  is,  every  segment  time 
unit.  Chapter  6 presents  overall  conclusions  and  prospects  for 
further  research. 


Lse-Ji, 


f 


- 27  - 


2,  THE  L-M  MEMORY  ORGANIZATION 

2.1  Introduction 

In  the  previous  chapter,  a semiconductor  random  access  memory 
chip  was  characterized  by  its  address  and  memory  cycle  times,  t and 
t^  respectively.  Many  large  scale  integrated  RAM  chips  have  their 
address  cycle  significantly  smaller  than  the  memory  cycle  time.  This 
difference  is  increased  when  an  address  latch  is  incorporated  within 
the  chip  to  gate-in  the  address.  Similarly,  write  data  hold  time  may 
be  short  and  read  data  may  be  enabled  rn  a controllable  window.  It 

I 

was  assumed  that  the  address  is  typically  held  on  the  address  bus  at 
least  as  long  as  data  is  held  on  the  data  bus.  Hence  the  data  busses  do 
not  pose  a limiting  constraint  and  are  not  explicitly  considered  in 
the  memory  organization  discussed  here.  In  this  discussion,  a 1 i ne 
is  used  to  denote  an  address  bus  within  the  memory.  Hence,  assuming 
that  there  are  N identical  memory  modules  in  the  memory,  there  can  be 
up  to  N independent  lines  in  the  memory. 

In  this  chapter,  a memory  organization  is  developed  to  exploit 
the  capabilities  of  the  memory  chips  discussed  in  Chapter  I.  For  a 
system  environment  with  s*p  distinct  instruction  streams  running  con- 
currently, the  absolute  size  of  the  main  memory  M,  would  be  expected 
to  be  large  enough  to  accomodate  at  least  the  working  set  [22]  of  the 
s-p  distinct  processes.  Since  the  research  focuses  on  the  memory  con- 

■*  flict  problem,  page  faulting  is  not  modeled  here. 

I 

r 


1 


With  respect  to  present  day  practice,  the  total  number  of  words 
z,  in  a memory  module  is  small.  Hence  for  a large  main  memory  of  size 
M,  the  number  of  memory  modules  N will  probably  be  large,  since  M = Nz. 
The  memory  organization  for  multiprocessor  systems  discussed  by  pre- 
vious authors  have  N lines  for  N independent  memory  modules.  Although 
the  performance  of  such  memory  organization  is  good,  the  line  cost  is 
usually  high.  Assuming  that  each  module  contains  an  address  latch,  a 
module  in  which  a memory  operation  is  initiated  uses  its  associated 
line  for  a duration  much  less  than  one  memory  cycle  per  access.  There- 
fore, more  than  one  module  can  share  a line  thereby  increasing  the  line 
utilization  and  reducing  the  line  cost.  However,  as  a consequence  of 
line-sharing,  the  performance  is  degraded.  Furthermore,  a more  complex 
bussing  scheme  is  required  as  additional  conflict  situations  are 
introduced.  In  the  one-dimensional  memory  organization  (that  is,  one 
module  per  line),  the  memory  conflict  problem  arises  when  two  or  more 
simultaneous  memory  requests  reference  the  same  memory  module  hence  the 
same  line.  For  the  two-dimensional  memory  discussed  here,  a conflict 
may  also  occur  when  a memory  request  references  a busy  line  or  a busy 
module  on  a line.  In  general,  the  degree  of  performance  degradation  fo 
an  associated  degree  of  line-sharing  is  not  intuitively  obvious.  Hence 
we  will  study  the  tradeoffs  and  relationships  of  line-sharing  and  per- 
formnace. 

2.2  Memory  Configuration 


The  memory  and  processor  operation  described  so  far  suggest  that 


the  memory  system  can  be  partitioned  in  such  a way  as  to  accommodate 
the  possible  arrival  rate  of  p requests  per  STU  for  a parallel-pipelined 
processor  of  order  (s,  p) . Each  stream  of  the  s*p  streams  in  the  pro- 
cessor issues  one  request  every  s STUs . The  memory  system  can  be  de- 
veloped from  the  basic  topology  of  memory  system  configuration  for  a 
multiprocessor  system  in  which  the  number  of  lines  and  the  number  of 
memory  modules  are  equal.  It  is  obvious  that  because  of  the  multiplic- 
ity of  processing  elements,  more  than  one  line  is  usually  necessary  to 
avoid  excessive  memory  conflicts.  If  the  segment  time  and  the  address 
and  memory  cycle  times  are  all  equal,  then  no  line  sharing  is  required 
because  whenever  a module  is  active.  Its  associated  line  is  also  active. 
However,  if  t<  t < t , then  whenever  a module  is  active  for  a memory 
cycle  t^,  its  associated  line  is  active  for  only  a fraction  of  the 
memory  cycle.  By  multiplexing  a group  of  modules  on  a line,  the  per- 
iod for  which  the  line  is  inactive  may  be  used  to  broadcast  the  address 

of  a new  request  which  references  an  inactive  module  on  the  line.  Hence 
some  lines  can  be  eliminated  and  their  associated  modules  equally  dis- 
tributed and  multiplexed  with  modules  on  other  lines.  In  the  memory 
organization  to  be  discussed,  the  modules  are  organized  in  a two-dimen- 
sional matrix,  with  line  i and  module  j on  line  i referred  to  as  L. 

and  M.  . respectively.  This  organization,  referred  to  as  the  L-M  mem- 
* f J 

ory  organization.  Is  shown  in  Figure  2.2.1.  The  memory  organization 
consists  of  N(»  2*^)  identical  memory  modules  arranged  such  that  there 
are  I lines  and  m modules  per  line,  where  i * 2^  for  Integer  b and  n 

such  that  0 < b < n and  m “ 2^  Recall  that  a line  refers  to  the 


31 


address  bus  conrnon  to  a set  of  m modules.  Figure  2.2.2  shows  the  bus 
structures  of  the  memory  organization.  Each  set  of  modules  on  a line 
in  addition  to  sharing  the  same  address  bus  share  the  same  data  input 
and  data  output  busses.  That  is,  there  is  one  each  of  address,  data 
input  and  data  output  busses  for  the  set  of  m memory  modules  on  each 
line. 


Given  the  memory  address  of  a word  in  memory,  as  shown  in  Figure 

2.2.3,  the  least  significant  n bits  of  the  address  determine  the  line 

and  module  segments  that  are  required  to  access  the  word,  and  the 

higher  order  bits  of  the  address  determine  the  addresss  of  the  word 

within  the  selected  module.  The  b least  significant  bits  of  the  n 

bits  address  one  of  the  2^  lines, L,,  and  the  next  higher  order  n - b 

bits  address  one  of  the  2*^  ^ modules  on  line  i , M.  ..  Hence,  the 

I .J 

modules  are  interleaved  on  the  low  order  n bits  and  the  lines  on  the 

low  order  b bits.  This  scheme,  as  will  be  shown  in  the  next  chapter, 

tries  to  maximize  the  probability  that  for  p = 1,  a successive  requests 

are  to  distinct  lines  and  c successive  requests  are  to  distinct  modules. 

Figure  2.2.k  shows  the  reservation  table  for  a parallel-pipelined 

processor  of  order  (s,  p)  * (7,  1)  having  access  to  line  L.  and  module 

M.  . of  a memory  system  whose  module  characteristics  are  (a,  c)  “ (2,  k) . 
* *J 

Any  computation  initiated  at  time  t runs  straight  through  the  7 proces- 
sor segments  during  <t,  t + 7>  and  uses  some  line  segment  throughout 
<t  + 1,  t + and  some  module  segment  on  chat  line  throughout  <t  + 1, 
t + 5>.  Processor  segments  Sj , S^,  5^,  and  may  be  used  simply  to 

retain  the  states  of  the  processes  as  they  access  memory.  Information 


Doto  Input 
Bus  For  Line  0 


Doto  Output 
Bus  For  Li  ne  0 


Doto  Output 
Bus  For  Line  1 


Doto  Output 
Bus  For  Line 


Figure  2.2.2  Bus  structures  of  the  memory  organization 


33  - 


I 


Address  Module  Number  Line  Number 

In  Module  On  That  Line 


Figure  2,2.3  Memory  address  word 


Processor  J _ 

I ^4 


Segments 


Ss 

Line  Lj 
Module 


t+1 


t+2 


t+3  t+4 

Time 


t+5 


t+6  t 


Fi^.re  2.2.4  Reservation  table  for  parallel-pipelined  processor  of 
order  (7,  1) 


35  - 


from  these  segments  can  also  be  used  to  control  certain  functions  in 
the  memory  system.  Note  that  the  operations  in  the  line  and  module 
segments  cannot  be  preempted.  All  tasks  executed  by  the  processor  are 
identical  in  this  respect  except  for  the  values  of  i and  j. 

For  brevity,  a memory  configuration  characterized  by  (£,,  m)  , is 
a particular  realization  of  the  L-M  memory  organization  discussed 
above,  where  the  number  of  lines,  il  » 2*^  and  the  number  of  modules  per 
line  m » 2*^  For  example,  if  b ■ 3,  n ■ 5,  then  H “ 8 and  m = 4. 
Hence  we  have  a memory  configuration  of  (£,  m)  ■ (8,  4). 

2.3  Memory  Request  Scheduling 

Each  pipelined  processor  issues  one  memory  request  every  STU  and 
p parallel  processors  issue  p simultaneous  requests  each  STU.  Of 
these  p parallel  requests,  some  of  them  might  address  the  same  line 
resulting  in  a conflict.  Even  when  all  p simultaneous  requests  address 
distinct  lines,  conflict  can  still  result  if  a request  addresses  a line 
or  module  which  Is  still  executing  a previous  request.  Such  a line  or 
module  which  is  executing  a request  at  time  t is  said  to  be  busy  or 
active  at  time  t.  If  a line  or  module  is  not  busy.  It  is  said  to  be 
idle  or  inactive. 

Definition  2.3.1  A memory  request  collision  Is  said  to  occur  when 
a memory  request  attempts  to  access  a busy  line  or  module,  or  when 
at  least  two  simultaneous  requests  attempt  to  access  the  same  line.  □ 


- 36  - 


When  nwre  than  one  request  attempts  to  access  the  same  line  simultane- 
ously, a multiple  access  line  collision  occurs.  When  a request  at- 
tempts to  access  a busy  line,  a line  coliision  occurs.  Similarly,  a 
module  collision  occurs  when  a request  attempts  to  access  a busy  module. 


Definition  2.3.2  The  status  of  Module  M.  j at  time  t is  Lj  busy 
or  Idle  at  t,  and  Mj  j busy  or  idle  at  time  t. 


The  status  of  a memory  module  addressed  by  request  is  required  to 

determine  the  outcome  of  the  request.  A request  can  access  module  M.  . 

* »J 

at  t if  and  only  If  M.  . and  its  line  L, , are  both  idle  at  t.  Hence 

I ,J  ' 

a request  is  rejected  if  it  addresses  a busy  line  or  module.  However, 
if  one  or  more  simultaneous  requests  refer  to  Idle  modules  on  the  same 
idle  line,  one  of  these  is  accepted  and  the  others  rejected.  One  method 
of  handling  rejected  requests  is  to  recycle  them  through  their  corres- 
ponding processor  segments  for  one  memory  cycle  and  resubmit  them  as 
new  memory  requests  one  instruction  cycle  later.  During  the  recycling 
of  each  rejected  request,  a flag  Is  set  in  the  process  of  the  rejected 
request  to  deactivate  execution  of  that  process  until  the  request  is 
accepted,  whereupon  the  flag  is  reset  and  execution  is  reactivated. 

Following  an  Initiation  of  a memory  operation  at  time  t on  line 
L.  and  module  M.  .,  L,  Is  busy  with  respect  to  other  requests  at  time 
t,  t + 1,  ...,  t + a - 1.  Similarly,  Mj  j Is  busy  with  respect  to  other 
requests  at  time  t,  t+1,...,  t+c-1.  Hence  Lj  and  M.  j 
busy  in  the  interval  <t,  t + a>  and  <t,  t + c>  respectively. 


rema  I n 


- 37  - 


When  a multiple  access  line  collision  occurs  on  one  line,  only 
one  of  the  requests  may  be  accepted.  A request  is  termed  an  accept- 
able request  if  it  addresses  an  idle  module  on  an  idle  line.  If 
there  is  only  one  such  request  for  a line,  the  request  will  be  accept- 
ed. However,  when  there  is  more  than  one  such  acceptable  request,  one 
of  them  is  accepted  arbitrarily  and  the  others  rejected.  However,  it 
should  be  pointed  out  that  the  acceptance  of  a request  may  depend  on 
whether  the  busy  module  rejection  is  by  the  module  itself  or  is  cen- 
tralized. In  practice,  as  illustrated  by  the  simulation  model  dis- 
cussed in  Chapter  k,  a priority  scheme  may  be  adopted  to  select  one  of 
the  acceptable  requests  to  be  accepted.  One  such  priority  scheme  assigns 
a distinct  priority  to  each  processor  so  that  any  request  issued  by  a 
particular  processor  has,  associated  with  it,  the  processors  priority 
number.  Another  scheme  selects  one  of  the  acceptable  requests  according 
to  a round-robin  processor  priority  assignment. 

In  summary,  there  are  basically  three  different  types  of  stumbling 
blocks  which  can  deter  a request  from  being  accepted  by  the  memory  system. 
A request  may  be  rejected  due  to 

(1)  multiple  access  line  collision,  which  may  occur  only  if  p > 1, 

(2)  line  collision,  which  may  occur  only  if  a > I,  and 

(3)  module  collision  due  to  a busy  module  on  an  idle  line, 
which  may  occur  if  c > a. 

2.k  Processor- Memory  Interconnection 


On  the  arrival  of  p simultaneous  requests  at  time  t,  the  following 


- 38  - 


situations  must  be  checked  before  any  request  is  accepted. 

(i)  The  status  of  each  module  referenced  by  each  request. 

(ii)  Multiple  access  iine  collisions  on  lines  associated 
with  referenced  modules. 

When  these  items  have  been  checked  and  all  conflicts  resolved, 
the  accepted  requests  must  be  routed  to  the  appropriate  lines  and 
modules.  Basically,  functional  units  are  required  to  (1)  maintain 
status  of  currently  busy  lines  and  modules;  (2)  resolve  multiple 
access  line  collisions;  and  (3)  successfully  route  accepted  addresses 
to  referenced  lines. 

Details  of  such  functional  units  pose  a major  problem  in  the  de- 
sign of  real  parallel-pipelined  computer  systems.  There  exist  a wide 
variety  of  possible  implementations  of  such  devices  and  a particular 
choice  depends  on  the  designer's  objectives. 

The  functional  unit  required  to  store  and  update  currently  busy 
module  status  may  also  be  required  to  accept  or  reject  incoming  re- 
quests. Such  a unit  would  perform  a module  and  line  busy  check  for 
each  of  the  incoming  requests.  Requests  to  idle  modules  on  idle  lines 
would  then  be  steered  through  a p to  il  crossbar  switch  which  would 
arbitrate  requests  to  the  same  line.  The  crossbar  dimensions  are 
hopefully  significantly  smaller  than  the  p to  N crossbar  required  by 
previous  memory  system  organizations  (with  s ■ I and  £ ” N) . 

Two  possible  Implementations  of  busy  check  hardware  are  readily 
apparent.  Each  involves  a small  memory  to  store  module  busy  status 
and  another  for  line  busy  status.  The  first  scheme  employs  N shift 
registers  of  c - 1 bits  each  and  I shift  registers  of  a - I bits  each 


- 39  - 


as  the  two  memories.  These  are  shifted  right  once  per  STU  with  I 's 
introduced  on  the  left.  The  least  significant  bits  of  the  p requested 
locations  are  read.  An  addressed  module  or  line  is  busy  if  0 is 
read  from  either  register.  In  such  a case,  the  corresponding  request 
is  rejected.  If  a forwarded  request  is  accepted  by  the  crossbar  switch, 
the  corresponding  module  and  line  shift  registers  are  cleared  to  0. 

An  alternative  implementation  employs  two  multi-access  content 
addressable  memories  of  size  p(c  - 1)  words  by  log^  (N  + 1)  bits  to 
store  busy  module  addresses  or  "blanks"  and  p(a  - 1)  words  by 
log^  (ii  + 1)  bits  to  store  busy  line  addresses  or  blanks.  Notice  that 
if  a ■ c,  only  one  multi-access  CAM  is  required  and  it  is  used  to  store 
the  addresses  of  busy  lines.  In  each  memory,  p words  are  addressed 
by  content  each  STU.  A bit  implies  that  the  line  or  module  referenced 
is  busy.  The  memories  are  divided  into  successive  blocks  of  p locations. 
One  block,  selected  In  a round-robin  fashion,  is  overwritten  each  STU 
with  addresses  of  new  requests  or  "blanks"  if  some  requests  were  not 
accepted. 

One  might  be  concerned  that  the  functional  units  appear  to  be 
infinitely  fast.  This  assumption  is  adequate  for  the  model  and  simpli- 
fies the  discussion.  In  practice,  requests  should  arrive  early,  e.g., 
at  t in  Figure  2.2.4.  The  functional  unit  would  then  be  active  during 
<t,  t + 1 > and  the  line  and  module  activity  would  be  unchanged.  Alter- 
natively, s may  be  increased  for  the  purpose  of  adding  STUs  for  memory 
management.  As  will  be  seen,  performance  is  not  a function  of  s and  is 
not  thus  affected  by  the  time  required  by  the  accept/reject  logic,  pro- 
vided an  adequately  pipelined  processor  Is  used. 


1 


- ko  - 


3.  PERFORMANCE  ANALYSIS  OF  L-M  MEMORY  ORGANIZATION 
3.1  Introduction 

In  a parallel-pipelined  processor  of  order  (s,  p) , p simultaneous 
memory  requests  can  be  issued  to  the  memory  system  every  segment  time 
unit.  For  analytical  purposes,  it  is  assumed  that  the  addresses  of  the 
requests  are  independent  and  uniformly  distributed  among  the  N identical 
memory  modules.  This  assumption  yields  a conservative  estimation  of 
performance.  If  in  some  instruction  cycles  memory  is  not  referenced, 
the  performance  will  be  higher  than  indicated  since  there  would  be  less 
conf 1 let. 

Although  the  sp  Instruction  streams  are  independent,  a single  in- 
struction stream  is  often  vectoral,  referencing  distinct  lines  in  se- 
quence. However,  since  s > c,  a memory  module  which  is  executing  a pre- 
vious request  from  an  instruction  stream  would  have  completed  its  exe- 
cution when  the  next  request  from  the  same  instruction  stream  arrives. 
Hence  it  would  appear  that  since  there  is  no  execution  overlap  between 
instructions  of  the  same  stream,  program  locality  will  not  affect  the 
performance  significantly.  As  will  be  seen  In  the  analysis  to  be  pre- 
sented, performance  variation  is  largely  due  to  the  module  characteris- 
tics (a,  c) , the  memory  configuration  (i.,  m)  , and  the  processor  char- 
acteristics p and  T. 


In  order  to  refine  the  randomness  assumption  for  the  analytical 
model.  It  will  also  be  presumed  that  rejected  memory  requests  due  to 


- 41  - 

line  or  module  collisions  are  discarded.  The  discarding  of  rejected 
requests  justifies  the  assumption  that  the  addresses  of  the  requests 
issued  are  independent.  This  justification  is  tested  in  the  simulation 
model.  In  a practical  case,  one  method  of  handling  such  collisions  is 
to  cause  the  process  with  a rejected  memory  request  to  make  a non- 
computing pass  through  the  processor  segments  for  one  instruction  cycle 
and  to  resubmit  the  request  the  next  cycle.  In  such  cases,  the  process 
is  blocked  until  its  request  is  accepted.  Case  studies  of  this  and 
other  practical  cases  will  be  discussed  fully  in  Chapter  4. 

The  performance  analysis  of  the  memory  organization  is  carried  out 
using  discrete  Markov  models.  The  Markov  models  were  used  to  analyze 
the  effect  of  pipelining  in  the  system.  Hence,  the  modes  discussed 
in  the  fol lowi ng  sections  assume  that  p = 1.  The  effect  of  parallelism, 
p > 1 , on  the  performance  will  be  discussed  in  section  3.5. 

The  analytical  models  belong  to  a class  of  simple  homogeneous  dis- 
crete Markov  chains  with  a finite  number  of  mutually  exclusive  and  ex- 
haustive states  [23].  For  performance  evaluation  of  given  module  char- 
acteristics (a,  c) , and  memory  configuration  (Z,  m) , analysis  is  oriented 
toward  developing  the  "probability  of  acceptance":  the  probability  of 
being  in  any  one  of  a certain  class  of  states. 

One  inference  that  can  be  made  directly  from  the  assumption,  involv- 
ing uniformly  distributed  Independent  addresses,  is  that  the  probability 
of  a request  addressing  any  module  is  1/N.  Similarly,  since  lines  are 
identical  and  independent,  the  probability  of  a request  addressing  a 


line  Is  1/Z. 


3.2  State  Diagrams  for  p = 1 


First  we  develop  the  state  space  of  the  memory  system  for  the  case 
p = 1.  In  this  section,  we  assume  c > 1 since  c = 1 is  degenerate  and 
trivial . 

Definition  3.2.1  A modul e state  at  time  t is 

“ { } = 0 (null),  if  the  module  is  idle  at  t, 

= fr},  if  the  module  is  busy  at  t because  it  accepted  a request 
r STUs  ago,  where  r is  an  integer  such  that  1 ^ c - 1.  q 

Observe  that  the  state  of  a module  at  t which  accepted  a request  c 
STUs  ago  is  0.  Recall  that  since  the  memory  cycle  is  c,  the  module  was 
busy  in  the  interval  <t  - c,  t>.  Since  there  are  m modules  on  a line, 
the  states  of  all  m modules  on  the  line  represent  the  state  of  the  line. 

Definition  3.2.2  A 1 i ne  state.  X(t),  at  time  t is  the  set  union  of 
all  module  states  at  time  t for  all  modules  on  the  line  in  question.  For 
convenience,  the  line  state  is  enclosed  in  "("  and  ")“.  ^ 

Notice  that  only  nonnull  states  of  modules  on  the  line  are  required 
to  specify  the  state  of  the  line.  Note  that  the  line  state  only  identi- 
fies whether  some  module  on  the  line  is  in  each  state,  and  not  which 
particular  modules  are  in  which  state.  Specific  module  information  is 
not  needed  due  to  the  uniformity  and  independence  of  accessing  assumption. 
Furthermore,  there  can  be  at  most  one  module  on  a line  in  any  nonnull 
state,  thus  there  are  no  repeated  nonnull  module  states  and  a simple  set 


- i*3  - 


union  of  module  states  gives  the  line  state.  If  there  is  more  than  one 
busy  module  on  the  line  at  time  t,  the  module  states  are  separated  by 
a comma.  For  instance,  consider  the  line  state  of  a line  at  t which  has 

two  busy  modules  on  it,  one  of  which  accepted  a request  one  STU  ago  and 

the  other,  three  STUs  ago.  The  line  state  at  t for  this  line  will  be 

denoted  by  (1,  3).  For  convenience,  the  module  states  of  a line  state 

will  be  listed  in  ascending  order  of  busy  time.  Hence  (3,  1)  will  be 

written  as  (1,  3).  Moreover,  if  all  modules  on  a line  are  idle  at  t 

then  the  line  state  is  denoted  by  the  empty  set,  ( ) = 0. 

Since  there  are  £ lines  in  the  system,  the  line  states  of  all  I 
lines  will  specify  the  state  of  the  system. 

De f i n i t i on  3 • 2 • 3 The  system  state, a(t),  at  time  t is  the  unordered  set 

of  all  nonempty  line  states  at  time  t for  all  lines  in  the  system.  The 

J 

system  state  is  enclosed  in  "["  and  □ 

If  the  line  state  for  all  lines  is  0,  then  the  system  state  is  de- 
noted by  the  set  consisting  of  the  empty  set  [(  )]  * [0].  As  an  example 

of  system  state,  consider  the  system  v/ith  two  nonempty  line  states  (1,  3) 
and  (2).  The  system  state  will  be  denoted  by  [(1,  3) (2)].  For  conveni- 
ence, the  line  states  will  be  listed  in  ascending  order  of  the  first  mod- 
ule state  in  each  line  state. 

Given  the  state  of  the  system  at  t,  it  is  necessary  to  determine 
the  system  state  at  time  t + 1.  To  make  this  determination,  it  is  nec- 
essary to  understand  the  change  of  module  states  with  time.  Given  the 
module  state  of  a module  at  time  t,  the  module  state  at  time  t + 1 can 


be  evaluated  if  it  is  known  whether  a request  is  made  to  and  accepted  by 
that  module. 

Definition  "j.l.h  The  next  or  successor  state  of  a state  is  the  state 
at  time  t + 1 given  the  state  and  input  at  time  t.  □ 

Hence  for  a given  module  state,  the  next  module  state  can  be  ob- 
tained as  fol lows . 

Definition  3.2.5  Given  that  the  state  of  a module  at  t is  0,  the  next 
module  state  (at  t + 1)  is 

= { l},  if  a request  which  addressed  the  module  at  t was 
accepted,  or 

» 0 otherwise;  i.e,,  either  no  request  addressed  the  module 
at  t,  or  a request  which  addressed  the  module  at  t was 
rejected  due  to  a line  collision.  □ 

Therefore,  a module  remains  in  the  null  state  unless  it  accepts  a 
request  at  time  t whereupon  it  will  become  busy  and  remain  busy  during 
the  interval  <t,  t + c >.  For  a busy  module,  the  next  state  can  now  be 
evaluated  as  follows: 

Definition  3.2.6  Given  that  the  state  of  a module  at  t is  {r},  where 
r is  an  integer  such  that  1 ^ 1.  c - 1 , the  next  module  state  is 

{r  + l}lfr<C“l  and  0 If  r ■ c - 1. 


□ 


- 45  - 


Once  a module  accepts  a request,  it  goes  through  the  module 
states  {)},  {2},  ..,{c-l},  0.  Hence  the  maximum  utilization  of  a mod- 
ule is  one  accepted  request  per  c STDs.  Only  a module  In  the  module 
state  0 can  accept  a request. 

It  need  not  be  known  whether  a request  addressed  a busy  module  in 
order  to  evaluate  the  next  state  of  the  module.  Any  request  made  to  a 
busy  module  is  rejected. 

The  line  state  as  defined  does  not  really  explain  the  constraining 
factor  in  the  formation  of  a line  state.  The  constraining  factor  is  the 
module  characteristics  (a,  c) , which  implies  that  a line  which  accepts 
a request  will  remain  busy  for  a STUs.  Hence  no  two  busy  modules  on  a 
line  will  have  their  module  states  less  than  a STUs  apart.  This  leads  to 
the  formal  definition  of  a realizable  line  state. 

Definition  3.2.7  A line  state  X(t),  is  said  to  be  a real izable  1 ine 
state  for  module  characteristics  (a,  c)  if  X(t)  = (1,  2,  ...,  c-1)  and 

1.  it  is  the  null  state,  0,  or 

2.  it  consists  of  one  element,  r,  or 

3.  it  consists  of  two  or  more  elements  such  that 

for  any  r. , rj  e X(t) , I rj  - rj  | ^ a.  □ 

As  an  illustration,  the  following  are  all  the  realizable  line  states 
for  the  module  characteristics  (a,  c)  • (2,  6);  0,  (1),  (2),  (3),  (4), 

(5).  (1,  3).  (1,  4),  (1,  5),  (2,  4),  (2,  5),  (3.  5)  and  (1.  3,  5). 


- A6  - 


Determining  the  next  line  state  is  not  as  straightforward  as  deter- 
mining the  next  module  state.  Hence  some  definitions  are  needed  here  to 
clarify  the  presentation. 

A line  state  is  an  acceptance  state  if  it  contains  the  eiement  r = 1. 
A particular  line  state  enters  an  acceptance  state  one  STL)  after  it  was 
addressed  by  an  accepted  request.  Line  states  which  are  not  acceptance 
states  are  called  nonacceptance  states.  Similarly,  a system  state  is  an 
acceptance  state  if  it  contains  an  acceptance  line  state,  otherwise  it 
is  a rejection  state. 

There  are  at  most  two  possible  state  transitions  from  a particular 
line  state,  X(t) : one  is  to  an  acceptance  state  and  the  other  to  a non- 
acceptance  state.  The  system  state  a(t)  which  is  not  an  empty  state  can 
make  at  least  two  possible  state  transitions,  namely,  to  the  rejection 
state  and  to  one  of  its  possibly  many  acceptance  states.  Since  p = 1, 
it  is  obvious  that  the  empty  system  state  [0]  can  only  make  a transition 
to  the  acceptance  system  state  [(!)]. 

Two  state  transitions,  namely,  generative  and  regenerati ve,  are 
introduced  to  aid  in  generating  the  state  transition  graph.  Generative 
transitions  generate  new  states,  whereas  regenerative  transitions  gener- 
ate states  which  have  already  appeared  in  the  state  transition  graphs. 
Hence  a transition  is  regenerative  for  c > 1,  if  its  source  state  con- 
tains the  module  state  r = c - 1,  otherwise,  it  is  generative.  For 
generative  and  regenerative  transitions,  the  source  states  are  called 
generator  and  regenerator  states  respectively.  Note  for  c “ 1,  the  null 
state  is  the  only  state  of  the  system,  and  the  definitions  above  do  not 


apply. 


- 47  - 


For  example,  if  a request  references  a line  with  no  busy  modules 
on  it,  the  request  is  accepted  and  at  the  next  time  instant,  a module  on 
that  line  would  have  been  busy  for  one  STL'.  Hence  a generative  transi- 
tion occurs  from  the  null  line  state,  0,  to  the  acceptance  line  state, 

(1).  Similarly,  if  no  request  referenced  that  line,  all  modules  on  that 
line  will  be  idle  at  the  next  time  instant.  Hence  a regenerative  transi- 
tion occurs  from  the  null  line  state,  0,  to  the  nonacceptance  line  state, 
0. 

Definition  3.2.8  If  are  X(t)  such  that  1 r < a,  then  X(t)  is  a 
busy  1 ine  state,  otherwise  it  is  an  Idle  line  state . □ 

For  a busy  line  state,  the  line  is  busy  and  will  only  make  a 
transition  to  its  nonacceptance  successor  state  whether  or  not  a request 
addresses  the  line.  Notice  that  if  a = 1,  there  are  no  busy  line  states. 

In  order  to  develop  the  Markov  model  required  to  analyze  the  memory 
conflict  problem  on  a line,  the  line  state  space  is  investigated.  How- 
ever, we  need  to  know  the  probability  of  transition  from  one  state  to 
the  other  in  order  to  compute  the  probability  of  being  in  either  of  two 
classes  of  states  called  acceptance  and  nonacceptance  states.  The 
cardinality  of  states  is  useful  in  obtaining  the  transition  probabilities. 

Definition  3-2. 9 The  set  of  all  realizable  line  states  for  module 

characteristics  (a,  c)  is  A(a,  c) . Similarly,  the  set  of  all  realizable 
idle  line  states  for  module  characteristics  (a,  c)  is  A|(a,  c) . 


□ 


- A8  - 


Hence,  Aj(a,  c)  = {A  | X e A (a,  c)  and  if  r e X then  r ^ a } <= 

A(a,  c) . It  would  be  appropriate  to  represent  the  set  of  ail  realizable 
busy  line  states  for  module  characteristics  (a,  c)  by  A (a,  c)  = A(a,  c) 

D 

A|(a,  c)  . Notice  that  c)  0 Aj  (a,  c)  =•  0.  Recalling  that  the  sys- 

tem state  at  time  t is  composed  of  all  the  line  states  at  time  t in  the 
system,  each  system  state  consists  of  two  disjoint  subsets  of  line  states 

namely,  the  set  of  idle  1 ine  states  and  busy  line  states  at  t. 

Def i ni t ion  3»2. 10  The  number  of  elements  or  the  cardinality  of  a line 
state  X(t) , |X(t)|,  is  the  number  of  nonnull  module  states  in  the  line 
state.  Similarly,  the  cardinality  of  the  system  state  o(t),  |a(t)|,  is 

the  number  of  nonempty  line  states  in  a(t).  □ 

For  example,  i f a(t)  * [0] , |a(t)  | * 0 and  | [( 1 , 3)  (5)  ]|  = 2. 

Recall  that  when  a request  references  an  idle  module  on  an  idle  line 
the  request  is  accepted,  causing  the  line  state  to  make  a transition  to 
an  acceptance  line  state.  Similarly,  a line  state  makes  a transition  to 
a nonacceptance  line  state  if  no  request  referenced  the  line  or  a request 
which  referenced  the  line  was  rejected.  The  following  definitions  form- 
alize the  above  description. 

Def ini tion  3.2.11  A line  acceptance  function,  f^,  maps  the  idle 
line  state  X(t)  e Aj (a,  c)to  its  next  acceptance  line  state,  X(t  + 1)  e 
A(a,  c) , i.e.,  f (X(t))  * X(t  + 1).  Hence, 

d 

f^:  A|(a,  c)  A(a,  c) . 


□ 


- ks  - 


Notice  that  a busy  line  state  does  not  have  a next  acceptance  line 
state. 

De f i n i t i on  3 • 2 . 1 2 A 1 i ne  nonacceotance  function . f^,  maps  the  1 ine 
state  X(t)  e A(a,  c)  to  its  next  nonacceptance  line  state,  X(t  + 1)  e 
A(a,  c) , i.e.,  f^(X(t))  = X(t  + 1).  Hence 

f^:  A(a,  c)  -*■  A(a,  c)  . □ 

The  following  theorems  in  this  section  are  valid  for  the  p “ 1 case. 

Theorem  3.2.1  If  no  request  is  accepted  at  time  t by  a line  in  state 
X(t),  the  next  state  of  that  line  is  the  nonacceptance  state 
f^(X(t))  = X(t  + 1)  = (x  I X - 1 e X(t)  and  x < c) . 

Proof  We  can  apply  definitions  3-2.5  and  3-2.6  directly  to  each  ele- 
ment of  the  line  state,  X(t)  to  obtain  f^(X(t)).  Since  no  request  is 
accepted  by  the  line  at  t,  the  module  state  associated  with  each  module 
on  the  line  will  make  a transition  to  its  next  module  state.  Hence, 
each  module  in  the  null  module  state,  0 e X(t)  makes  a transition  to  0 
and  each  module  in  module  state  { r} , where  r e X(t)  and  i 1.  r <_  c - 1, 
makes  a transition  to{r+l}ifr<c-l  and  to  0 if  r ••  c - 1.  By  sub- 
stituting X for  r + 1,  the  theorem  follows.  □ 

Corol lary  3-2.1 .1  For  a transition  from  line  state  X(t)  to  its 
acceptance  successor  line  state  X(t  + 1) 

|X(t  +1)1-  |X(t) I - 1,  if  c - 1 e X(t) 

- (X(t) I , otherwise. 


non- 


- 50  - 


P roof  If  V r e X(t),  r<  c - I,  then  each  element  r e X(t)  is  mapped 

to  r + 1 e X(t  + 1).  Hence  |X(t  + 1)|  = |A(t)j.  However,  if  3 r e A(t) 
such  that  r = c - 1,  then  the  element  c - 1 e X(t)  is  not  mapped  to  an 
element  of  X(t  + 1).  Hence,  [X(t  + 1)|  = |>^(t)|  - 1 if  c - 1 
e X (t ) . 

Notice  that  if  c - 1 e X(t),  then  the  transition  from  X(t)  to  f (X(t)) 

n 

is  regenerative,  otherwise  it  is  generative. 

The  following  theorem  is  used  to  evaluate  the  next  acceptance  line 
state  of  an  idle  line  state. 

Theorem  3-2.2  If  a request  is  accepted  at  time  t by  a line  in  state 
X(t),  the  next  state  of  that  line  is 

(X(t))  = X(t  + 1)  = (1)  U (x  I X - 1 e X(t)  and  x<  c)  . 

O 

Proof  A request  is  accepted  by  the  line  if  the  line  is  idle  and  a re- 
quest references  an  idle  module  on  the  line.  The  next  module  state  of 
the  referenced  idle  module  on  the  idle  line  is{l}.  Concurrently,  all 
other  modules  on  the  line  make  transitions  to  their  respective  next 
states.  That  is,  the  set  of  next  module  states  from  theorem  3.2.1  is 
(x  I X - 1 e X(t)  and  x < c) . Hence  the  next  state  of  the  line  is 
X(t  + 1)  - (1)  U f^  (X(t)).  P 

Hence  if  X(t)  is  an  idle  line  state,  f (X(t))  = (1)  U f (X(t)). 

a n 

Corol lary  3.2.2. 1 For  a transition  from  idle  line  state  X(t)  to  its 
acceptance  successor  line  state  X(t  +1), 


-51- 

|X (t  + 1 ) I = |X(t) 1 , if  c - 1 e X(t) 

= |X(t)|  + 1,  otherwise 

□ 

This  is  obvious  from  theorem  3.2.2  and  corollary  3.2. 1.1.  Given 
a line  state,  the  next  line  state  can  therefore  be  evaluated  from  the 
two  theorems  above. 

A request  can  address  only  one  line  at  a given  time  instant.  If 
the  request  is  accepted  on  the  addressed  line,  only  its  corresponding 
line  state  will  be  transformed  to  its  next  acceptance  line  state,  while 
all  other  line  states  of  the  system  state  will  be  transformed  into  their 
corresponding  next  nonacceptance  states.  For  example,  if  (a,  c)  * (2,  4), 
a request  which  addresses  a line  represented  by  the  implicit  idle  line 
state  0 in  the  system  state  [(1)  (2)]  is  accepted  resulting  in  the  next 
acceptance  line  state  f (0)  (1).  The  other  line  states,  (1)  and  (2) 

3 

make  transitions  to  their  next  nonacceptance  states,  f^((i))  =■  (2)  and 
f^((2))  • (3)  respectively.  Hence  the  resulting  next  system  state  is 
[(1)  (2)  (3)].  However,  if  the  request  is  made  instead  to  an  idle  module 
on  the  idle  line  represented  by  (2),  it  is  accepted  since  2 ■ a and  the 
resulting  next  acceptance  system  state  is  [(1,  3)  (2)].  Furthermore,  if 
the  request  is  made  to  the  busy  line  represented  by  (1),  it  is  rejected 
and  the  next  rejection  system  state  is  [(2)  (3)].  With  these  illustrations 
it  can  be  seen  that  there  may  be  multiple  next  acceptance  system  states. 

On  the  other  hand,  for  any  given  system  state,  with  the  exception  of  the 
null  system  state,  there  exists  only  one  next  rejection  system  state. 

The  null  system  state  generates  only  the  acceptance  state  [(1)],  since 
any  request  which  arrives  will  be  accepted  by  an  idle  memory  system. 


- 52  - 


1 


In  order  to  develop  the  Markov  model  for  the  memory  conflict  prob- 
lem in  the  memory  system  for  p = 1,  the  system  state  space  is  investi- 
gated. The  system  state  space  can  be  obtained  systematically  by  generat- 
ing the  successor  system  states  from  a present  state,  given  that  a request 
was  either  rejected  by  the  system  or  it  was  accepted  by  a particular  idle 
line  in  the  system. 

Def ini tion  3.2.13  S(a,  c)  is  the  set  of  all  system  states  for  given 
module  characteristics  (a,  c) . □ 

In  systematizing  the  generation  of  all  the  system  states  for  module 
characteristics  (a,  c) , system  acceptance  and  rejection  functions  similar 
to  line  acceptance  and  nonacceptance  functions  will  be  defined.  Let 

a. (t  + 1)  represent  the  acceptance  successor  system  state  of  the  system 

A 

state  o(t)  which  accepted  a request  on  an  idle  line  with  at  least  one  idle 
module,  represented  by  the  idle  line  state  X(t)  e 0(t).  Assume  for  the 
moment  that  there  are  at  least  a lines  in  the  system.  Then,  S(a,  c) 
will  contain  a system  state  which  has  an  idle  line  state  which  represents 
an  idle  line  with  at  least  one  idle  module. 

Definition  3»2.1i<  A system  acceptance  function, 

^g^:  S(a,  c)  S(a,  c) , 

such  that  the  request  which  is  accepted  on  the  idle  line  represented  by 
the  idle  line  state  X eo(t)  causes  a transition  from  system  state  a(t) 
to  next  acceptance  system  state  o^(t  + 1).  Hence  * ^X^^  * 

□ 


- 53  - 


Notice  that  f (X)  e o.(t  + 1).  If  a request  is  made  to  an  empty 

d A 

system,  the  request  will  always  be  accepted.  Hence  the  next  system  state 
of  [0]  is  [(!)].  However,  for  p = 1,  one  request  arrives  every  STU, 
hence,  an  empty  system  cannot  cause  a rejection.  Therefore,  the  memory 
system  may  reject  a request  only  if  the  system  state  is  nonnull.  Let 
0^(t  + 1)  represent  the  rejection  successor  system  state  of  the  system 
state  o(t)  . 

Definition  3«2.15  A system  rejection  funct ion 
g^.:  S(a,  c)  - { [0]}  S(a,  c) 

maps  the  nonnull  system  state  a(t)  e S(a,  c)  to  its  next  rejection  state 
o^(t  + 1)  e S(a,  c)  . Hence,  g^(a(t))  = <?^(t  +1).  □ 

Hence  ^g^  and  g^  are  the  state  transformations  corresponding  to  an 
acceptance  on  the  idle  line  represented  by  the  idle  line  state  X and  a 
rejection  by  the  system  respectively. 

Theorem  3-2.3  If  a request  is  rejected  at  time  t by  the  system  in  state 
a(t),  the  next  state  of  the  system  is  the  rejection  system  state 

¥ 

^ a^(t  + 1)  * [X^  I X^  - ’ Xea(t)]. 

f Til®  result  is  directly  obtained  by  application  of  the  line  non- 

acceptance  function  to  each  line  state  in  the  system  state  o(t),  since 
I a system  rejection  implies  nonacceptance  on  all  lines  in  the  system.  □ 

f As  an  Illustration,  consider  the  system  state  [(1)(2)]  for  module 

I 
I 


characteristics  (a,  c)  ■ (2,  A).  The  next  rejection  system  state 


- 54  - 


a (t  + 1)  = [f  (1)  f (2)]  = [(2)  (3)].  The  next  rejection  system 
n n n 

state  for  [(2)  (3)]  is  [(3)]. 

For  a =•  c = 1,  the  only  state  of  the  system  is  the  system  state 
<^(t)  * [0].  In  this  case,  the  next  state  of  the  system  is  a(t  + 1)  = [0]. 
Hence  |a(t  + l)|  * |a(t)|.  Not  ice  that  for  p = 1,  there  is  always  accept- 
ance when  a = c *-  1.  However,  let  us  evaluate  the  cardinality  of  the 
next  system  state  when  a rejection  occurs,  assuming  that  c > 1, 
and  p = 1 . 

Corol lary  3-2. 3- 1 For  the  generative  transition  from  a nonempty  system 

state  a(t)  to  its  next  rejection  system  state  a (t  + 1), 

n 

|a^(t  + 1)  I = |a(t) ) 

Proof  If  a(t)  is  a system  generator,  then  each  nonempty  line  state 
X e 0(t)  is  a line  generator.  Hence,  fp(X)  0.  Therefore,  the  system 
rejection  function,  g^.,  induces  a one-to-one  onto  mapping  of  the  elements 
of  o(t)  to  those  of  a^(t  + 1).  Hence  lo^^Ct  + 1)|  * lo(t)l.  □ 

Corollary  3-2. 3* 2 For  regenerative  transition  from  system  state  a(t) 

to  its  next  rejection  system  state  cr^(t  + 1), 

1.  *)|  ' |o(t)|,  if  3 X(t)  e a(t)  | c - 1 e X(t)  and 
|X(t)I  > 1. 

2.  + I)|  ■ lo(t)|  - 1,  If  3 X(t)  e o(t)  I c - 1 e X(t)  and 
|X(t) 1-1. 

Proof  If  3 X c a(t)  | c - 1 £ X and  jX]  >1,  then  f (X)  i*  0.  Since 

when  p - 1,  there  is  at  most  one  line  regenerator  in  a system  regenera- 
tor, + 1)|  - |o(t)|.  However,  if  3 X £ a(t)  | c - 1 £ X and 


- 55  - 


|A|  = I,  then  f^(A)  - 0.  Hence,  Ia^(t  + l)[  * |cr(t)  | - I.  □ 

For  example,  if  (a,  c)  = (2,  k)  and  a(t)  * [(1,  3) (2)],  the  next 
rejection  state  is  a (t  + 1)  » [(2)  (3)]  and  la  (t  + 1)1  = |o(t) I = 2. 
Notice  that  the  line  regenerator,  (1,  3)  G o(t),  is  such  that 
|(l,  3) I * 2.  Suppose  o(t)  * [(l)(2)(3)],  the  next  rejection  state  is 
+ 1)  = [(2) (3)]  and  la  (t  + 1)1  = |o(t) I -1=2.  Notice  that 
in  this  case,  the  only  line  regenerator  in  a(t)  is  the  line  state  (3). 

Theorem  3.2.i>  If  a request  is  accepted  at  time  t on  an  idle  line 
represented  by  the  idle  line  state  X e a(t),  the  next  state  of  the 
system  is  the  acceptance  state, 

o,(t  + 1)  •=  [X  I X = f (X.),  X.  e a(t)  and  X.  X]  U [f  (X)]. 

A n n n I I i a 

Proof  An  acceptance  on  the  line  represented  by  X generates  the  line 

state  f (X).  All  other  line  states  in  a(t)  make  their  respective  tran- 

sitions  to  their  next  nonacceptance  line  states.  The  set  of  these  next 

nonacceptance  states  is  [X  I X * f„(X.),  X.  £ a(t)  and  X.  ^ X]. 

n n n I I i 

Therefore,  a,  (t  + 1)  ■ [X  | X * f (X.),  X.  e a(t)  and  X.  / X]  U 
A n n n c I i 

[f3(X)].  □ 

Notice  that  if  o^(t  + 1)  is  the  next  rejection  system  state  of  a(t), 

then  [X  I X ■ f (X.),  X,  E a(t)  and  X.  X]  » a (t  + 1 ) - [f„(X)]. 
n ‘ n n I I t n n 

Hence,  a^^(t  + 1)  *(a^{t  + 1)  - [f^(X)])U  [f^(X)]. 

Corol  lary  3.2.i>.  1 If  the  idle  line  state  X * 0 or  X e a(t)  is  such 
that  c “ 1 e X and  |xl  ■ 1, 


|a.,(t  + 1)|  “ lcf_(t  + Ol  + where  a (t  + 1)  Is  the  next  re- 

An  n 

jection  system  state  of  a(t). 

Proof  From  the  above  theorem,  |a, (t  + 1)|  • |a_(t  + 1)|  - | [f  (X)]| 

An  n 

+ I [f  (^)]l*  If  X ••  0 or  X e o(t)  Is  such  that  c - 1 e X and  |X|  = 1, 

d 

I [f  (X)]|  • 0.  Howev  er,  | [f  (X)]|  * 1 for  any  Idle  line  state  X e a(t), 
n d 

hence  the  result  follows. 

□ 


Corollary  3»2.l4.2  If  X 0 and  Is  a generator,  or  X e a(t)  Is  such 
that  c - 1 e X and  |X|  > 1, 

|a,  (t  + 1 ) I « |a  (t  + 1 ) . 

X ' ' n □ 

The  proof  of  this  Is  similar  to  that  of  corollary  3-2.4.1.  However,  In 
this  case,  |[f^(X)]|  ■ 1. 

For  example  if  (a,  c)  ■*  (2,  4),  let  a(t)  * [(1)  (2)].  The  next  re- 
jection system  state  of  [(1)  (2)]  is  + 1)  ” [(2)  (3)1*  f^or  this  system 

state  there  are  two  distinct  idle  line  states  namely,  Aq  •»  0 and  X^  = (2). 

Hence  the  exhaustive  next  acceptance  system  states  of  [(l)(2)]  are 
a0(t+l)  - [fg(0)]  U (a^(t+l)-[f^(0)]) -((l)]U([(2)(3)]-[0])  = [(1){2)(3)] 
and  '^(2)(t+l)  “ [(1,3)]  U ( [ (2)  (3)  ]- [ (3)  ] ) “ [(1,3)(2)].  Tables  3-2.1  and 
3.2.2  summarize  the  next  state  and  its  cardinality  as  a function  of  the 
present  state  for  line  states  and  system  states  respectively. 

Hence  from  the  last  two  theorems,  it  is  obvious  that  if  a state  a(t) 
can  make  a transition  to  <7^ (t+1 )»g^ (o) , it  can  also  make  a transition  to 
(t+1  )-^g^  (o) , for  each  idle  line  state  Xea(t).  It  is  interesting  to 
note  that  for  generative  transitions,  there  is  only  one  state  in  the  system 
which  makes  a transition  to  a particular  nonnull  rejection  system  state. 


- 57  - 


TABLE  3.2.1  Next  line  state  (NLS)  and  its  cardinality  as  a function  of 
present  line  state  (PLS) 


PLS 

X(t) 

Outcome 

(function) 

NLS 

X(t+1) 

Cardinal i ty 

lx(t+i) 1 

X(t) 

nonacceptance 

f„(X(t))  = 

lX(t) ]-l,  if  c-1  e X(t). 

f 

n 

(xjx-1  e X(t) 
and  X < c) 

1 X ( t) 1 , otherwi se 

X(t) 

acceptance 

f3(X(t))  = 

|X(t) I,  if  c-1  e X(t) 

f 

a 

(1)  u f (x(t))  I 
n 1 

1 X(t) 1 +1 , otherwi se 

TABLE  3.2.2  Next  system  state  (NSS)  and  its  cardinality  as  a function 
of  present  system  state  (PSS). 


PSS 

a(t) 

Outcome 

(function) 

NSS 

o(t+l) 

Cardinal i ty 
|c(t+l) 1 

a(t)  ^ 
[0] 

rejection 

9r 

g^(^(t))  = 

0 (t+1)  = 

[Xn  fXn  = 

f^(X) , X e a(t)  ] 

1 

|0(t) 1 , if  each  X e a(t) 
is  a line  generator  or, 

3a  line  regenerator 
X e a(t) 1 |Xl  > 1 . 

|0(t) 1“1 , if  3a  line  re- 
generator X e a(t) 1 |Xl  » 1 

a(t) 

♦ 

1 

acceptance 
on  line  state, 
X e a(t). 

X^a 

X9a(a(t))  ■= 

o,(t+l)  = 

U lA  - 
n ' n 

f^(Xj) , X.  e a(t) 
and  X.  X]  U 

[f3(X)l 

Io^(t+l)l+l,  ifX«0or 
X is  a line  regenerator 

I Ix|  - 1. 

|a^( t+1 ) 1 , if  X is  a line 
generator  |X  0 or, 

X Is  a line  regenerator 

I |X|  > 1. 

f 

I 


- 58  - 


That  is,  there  are  no  two  rejection  states  which  are  successors  of  the 
same  state.  However,  there  may  exist  several  acceptance  states  in  the 
system  which  are  successors  of  the  same  state.  As  will  be  seen  later 
for  regenerative  transitions,  more  than  one  regenerative  system  state 
may  make  a transition  to  the  same  system  state. 

The  set  of  all  system  states,  S(a,  c)  for  given  (a,  c)  can  be  gen- 
erated syste.matical  ly  by  the  application  of  the  system  acceptance  and 
rejection  functions.  In  addition  a system  state  graph  can  be  used  to 
display  the  next  system  state  function.  In  this  graph,  there  is  one 
, node  for  each  system  state  in  S(a,  c)  and  one  edge  leaving  each  node 

for  each  possible  outcome  of  a request.  The  paths  on  the  graph  show 
^ the  state  changes  experienced  by  the  system  for  all  request  sequences, 

j In  particular,  the  directed  edge  indicates  a transition  from  one  state 

at  time  t to  another  at  timet  + 1.  An  algorithm  to  generate  S(a,  c)  and 
I obtain  a system  state  graph  starting  from  the  null  system  state  will  now 

be  developed. 

Let  ^ [0]}.  Generally,  is  the  set  of  system  states  gen- 

erated by  one-step  transitions  from  the  states  in  . For  example, 

Vj  is  the  set  of  system  states  generated  by  the  null  system  state,  [0] 

' which  is  the  only  element  in  V^.  The  set  of  rejection  states  generated 

) ^ by  the  states  in  V , is  called  R , where  R ={a.l0.  = q (a.),  a.  e V . 

' ’ n-1  n ’ n j ' j r I I n-1 

Similarly,  the  set  of  acceptance  states  generated  by  the  states  in  V _j 

f 

is  called  A , where  A = { a, la,  » ,g  (a) , for  some  idle  line  state  X e O 
n n X'  X X a 

and  a e Notice  that  the  null  line  state,  0 e a,  is  an  idle  line 

state.  Then  V -A  UR.  Notice  that  R,  is  an  empty  set  since  g^([0]) 
n n n I r 

does  not  exist.  Hence  R^  *>  0 and  Vj  ■■  Aj  ■ { [(!)]}.  Similarly,  A^  ■ 0, 

I 

I 


59  - 


while  R = {[(0)]}  = V . Notice  that  in  general  a e V if  and  only  if 
3 X e o such  that  n e X,  for  1 £ n £ c - 1.  It  will  be  shown  that 
is  the  set  of  all  regenerative  system  states  for  given  (a,  c).  However, 
let  us  introduce  some  definitions  and  lemmas  to  aid  in  the  development. 

Definition  3.2.16  For  generative  transitions,  the  inverse  line  function, 
f \ maps  the  line  state,  X(t)  e A(a,  c)  to  its  predecessor  line  state 
X(t  - 1)  e A(a,  c),  i.e.,  f Vx(t))  = X(t  - 1).  Hence,  f 
A(a,  c)  -»■  A(a,  c) . 

□ 

The  inverse  line  function  is  defined  for  generative  transitions 
only.  In  this  case,  the  predecessor  state  is  unique.  That  is,  if  all 
regenerative  transitions  are  ignored,  there  exists  one  and  only  one 
predecessor  state  for  each  line  state  in  A (a,  c).  Notice  that  f ^ is 
neither  onto  nor  one-to-one  mapping  since  no  state  is  mapped  to  regenera- 
tive line  states . 

Lemma  3.2.1  For  generative  transitions,  the  predecessor  state  of  a 
line  state  X(t)  is  X(t-l)  ■ f '(X(t))  = (xjx+1  e X(t)  and  x > 0). 

Proof  The  proof  of  this  is  straightforward  from  theorems  3.2.1  and 

3.2.2,  since  X(t)  » f (X(t-l))  or  X(t)  ••  f (X(t-l)). 

n a □ 

Hence  for  the  (a,  c)  ■ (2,  k)  example,  if  X(t)  * (1,3),  f *(X(t))  » 
(2).  Similarly,  If  X(t)  - (2),  f"’(X(t))  = (l). 

Similarly,  for  generative  transitions,  we  can  determine  the  pre- 
decessor state  of  a nonnull  system  state  from  the  following  definition 
and  lemma. 


JL 


- 60  - 


Definition  3.2.17  For  generative  transitions,  the  inverse  system 
function,  g maps  the  nonempty  system  state,  a(t)  e S(a,  c) , to  its 
predecessor  state,  a(t-l)  e S(a,  c) , i.e.,  g *(a(t))  = a(t-l).  Hence 
g~':  S(a,  c)  - {[(0)]}  -*•  S(a,  c). 

□ 

g ^ is  neither  onto  nor  one-to-one  mapping  since  no  state  is  mapped  to 
regenerative  system  states. 

Lemma  3 « 2 . 2 For  generative  transitions,  the  predecessor  state  of  a 
nonempty  system  state  a(t)  is 

o(t-l)  - g '(a(t))  = [X|X  = f '(Xj),  X.  e a(t)]. 

Proof  The  proof  of  this  can  be  obtained  directly  from  theorems  3.2.3 
and  3-2. k and  Lemma  3.2,1,  since  a(t)  = g^(o(t-1)  or  a(t)  = 59,(o(t-l)), 
for  some  idle  line  state  X e a(t-l). 

□ 

For  the  (a,  c)  * (2,  k)  example,  if  a(t)  = [(3)]  or  [(1,  3)(2)], 
then  g '(a(t))  = [(2)]  or  [(1)(2)]  respectively.  In  general,  we  can 
denote  the  jth  successive  application  of  g ' to  a as  g •*  for  j ^ 0. 

Hence  g ■*  (a)  » g ^ (g  '^(o)).  If  j =«  0,  then  g ^(a)  = cr. 

Theorem  3.2.5  V^_j  is  the  set  of  all  regenerative  system  states  for 
given  (a,  c). 

Proof  We  know  that  is  the  set  of  system  states  such  that  for 

every  a e , 3 X e a such  that  c-1  e X.  If  o e then  g ' (o)  e 

If  a state  is  regenerative  then  by  definition  it  is  in  Now,  if  all 

the  states  in  V^_j  are  not  regenerative  then  3 o £ such  that  a does 

not  make  transitions  to  states  al ready  generated.  That  is,  at  least  one  of 


- 61 


P 


c- 1 

g (a)  and  ,g  (a),  for  an  idle  line  state  X e a,  is  not  in  U V..  We 
^ ^ i=0  ' 

know  that  if  o can  make  a transition  to  g^(o),  it  can  also  make  transi- 
tions to  ;^9g  (ct)  for  any  idle  line  state  X e a.  Hence  it  suffices  to  show 

c-1  c-1  c-1 

that  g (a)  e L)  V.,  since  if  g (a)  e U V.,  .g  e U V.  for  any  idle 
1=0  1=0  1=0 

state  X E a. 

Since  a(t)  E contains  the  module  state  c-1,  a transition 

from  a(t)  to  g^(a)  at  t+1  implies  that  the  module  whose  module  state 

is  c-1  is  idle  at  t + 1 . Hence  for  a,  the  module  state,  c - l,does 

not  contribute  any  state  information  at  t+1.  Therefore,  for  each  a E 

remove  the  irodule  state  c - 1 from  its  regenerative  line  state.  Let  o' 

represent  the  resulting  system  state.  Then  for  nonnull  o',  g|.(a)  - g^(o'). 

If  a = [ (c  - 1)],  then  o'  = [0]  and  gj.(o)  = o'  = [0].  Notice  that  the 

largest  module  state  in  nonnull  o'  is  less  than  c-1.  For  the  (a,  c)  = 

(2,  4)  example,  if  o - [(1,3)(2)],  then  o'  = [(1)(2)]  and  g^(o)  = g|.(o') 

[(2)  (3)].  Let  represent  the  set  of  o'  such  that  o E If 

c-2  c-1 

we  show  that  o'  e U V.,  then  g (o')  e U V..  By  successive  application 
i=0  '■  i-0  ' 

of  Lemma  3.2.2,  3 j which  will  make  g ■*  (o)  = [0],  for  some  j ^ c - 2, 

since  the  largest  element  of  any  nonnull  line  state  in  o'  is  less  than 

c-1.  Hence  It  is  always  possible  to  make  transitions  from  [0]  to  o' 

c-2 

In  j time  instants.  Hence  o'  e U V..  Since  all  states  in  V . make 

. - I c-2 

1=0  , , 

c-1  c-1 

transitions  to  states  in  V i,  g (o')  E U V..  Therefore,  g (o)  £ U V, 

i=0  ' '■  i-0 


and  the  theorem  follows. 


□ 


Hence  Vq,  Vj , ...,  V generate  all  the  system  states  while  simul- 

c-1 

taneously  creating  all  generative  transitions.  Thus  S(a,  c)  * U V.. 

i-0 

In  the  process  of  forming  V^,  all  regenerative  transitions  are  created. 


1 


- 62  - 


This  completes  the  formation  of  the  system  state  graph. 
Theorem  3-2.6  V = S(a,  c) , i f 2.  > a and  m > 


c-1 
a _ 


Proof  From  Theorem  3.2.5,  we  know  that  every  system  state  ae  V^_j 
always  makes  a transition  to  a state  in  S(a,c).  We  need  to  show  that  for 
every  state  a.  E S(a,c),  3 a e which  makes  a transition  to  a..  States 

in  S(a,c)  can  be  partitioned  into  two  disjoint  subsets,  namely,  sets  of 

• . , / \ c-1  c-1 

rejection  and  acceptance  states.  That  is,  S(a,c)  ” U R.  U U A..  We 

1 “0  ' i =0  ' 

c-1 

know  that  if  a e V ,,  a makes  a transition  to  a e U R.  and  also  makes 

c-1  n i«o  I 

c- 1 

a transition  to  a,  e .U  A.,  where  a,  * -.g  (a),  for  each  idle  line  state 

A I »0  * A A 3 

X e a.  Since  I > a and  at  most  a-1  lines  are  busy,  there  is  always  an 


idle  line  and  since  m > 


c-1 
_ a _ 


there  is  always  an  idle  module  on  an  idle 


line.  Also  from  the  previous  theorem,  we  know  that  if  a e V _j,  for  every 
C“ 1 C" 1 

a,  e U A.,  3a  e U R,  I a,  = fcy  - [f  (X)])  U [f  (X)],  for  idle  line 
A:=f)i  nisOi  '^  n n a 


c-1 


an 


state  X e a.  Hence  we  only  need  to  show  that  g : V , U R.  is 

r c-1  i»o  ' 

onto  mapping  in  order  to  prove  the  theorem. 

There  are  two  kinds  of  system  states  in  namely,  states  in 

which  Xj  e a | X^  * (c-1),  and  states  in  which  3 Xj  e o | c - 1 e Xj 
and  |Xj|  > 1.  Hence  the  former  system  state  can  be  represented  by 


[X^  ...  X|^  ^k+1  ^ * where  X^ , ^2*  • • • > ^ 


k+1 


are  line  states  and 


X|^^^  • (c  - 1).  The  latter  system  state  can  be  represented  by  Og  = 

[X^  ...  Xj^l , where  Xj,  X^,  ...  Xj^  are  line  states  and  3 X^  £ | c - 1 

e X . and  I X . I > 1 . 

J ' 

c-1 

a e UR,  can  be  represented  by  a [X  ' ...  X,  ']  where  X,', 
n i*o  ' ' n 1 k 1 * 

X^',  ...,  X|^'  are  rejection  line  states.  A state  a e can  make  a 

transition  to  a if  a - g„(tj)  “ [X,'  I X.'  - f (X.),  X.  e a],  from 
n n ^r''  jj  nj’j  ’ 

Theorem  3.2.3.  It  can  be  seen  that  if  a ■ or  Og  the  transition  from 

c-1  ^ 

aeV  .too  e U R,  may  occur. 
c-1  n i-0  ' 


- 63  - 


An  example  may  be  appropriate  here.  In  order  to  obtain  S(2,  k) 
and  the  system  state  graph  for  (a,  c)  = (2,  ^4)  , start  with  = { [0]}. 

Then  Vj  = Aj  = { [(l)]},  where  [(!)]  = Hence  there  is  a directed 

edge  from  state  [0]  to  [(!)]  as  shown  in  figure  3*2. 1.  The  arc  from  the 
null  system  state  to  the  acceptance  state  [(l)]  is  identified  by  ^g^ . 

In  general,  arc  labels  in  figure  3.2.1  are  identified  by  g and  ,g  if 
the  transition  is  due  to  a rejection  and  if  the  transition  is  due  to 
an  acceptance  on  the  line  represented  by  the  idle  line  state  X respectively. 
R2  [(2)]}.  where  [(2)]  - g^([(l)])  and  A^  -{  [(l)(2)]},  where  [(l)(2)]  - 
0g3([(l)]).  Hence  = A^  U - { [(2)],  [(1)(2)]} . Simi larly,  = 

{ [(3)],  [(2)(3)]}  and  A^  - { [(l)(3)],  [(1,  3)],  [(1)(2)(3)].  [(1.  3)  (2)  ]}  . 

» A^  U R^.  Notice  that  al 1 elements  of  V. , for  1 ^ I ^ c - 1 , contain 
a line  state  containing  i.  In  particular,  for  1 = c ~ 1 = 3t  is  the 
set  of  regenerative  states.  In  creating  all  the  regenerative  transitions, 
notice  that  R,  =*  R.  and  A.  * A..  Therefore,  V,  » S(2,  k)  V.. 

1=0  I A 1=0  I ’A  ’ 1=0  I 

In  addition  to  the  graphical  illustration  of  state  transformations, 
the  system  state  graph  aids  in  the  performance  analysis  of  the  system. 

The  performance  of  the  system  will  be  measured  by  the  probability  of  ac- 
cepting a request  into  the  memory  system.  In  order  to  determine  the  prob- 
ability of  acceptance,  the  probability  of  transition  from  one  system  state 
to  the  other  should  be  known. 

Definition  3.2.18  The  probability  of  transition,  Pjj»  the  conditional 
probability  of  going  from  system  state  OjCt),  at  time  t,  to  its  successor 
system  state  Oj(t  + 1),  at  time  t + 1.  Rewriting  this  statement  in 
probability  notation,  Pjj  ■ P(Cj,  t + 1 ] 0.,  t),  t is  an  integer.  □ 


I 


Gk  - 


Vq  Vj  Vg  V3 

t t t t 


[ (2)9o  ) 

/ FP-5291 


Figure  3.2.1  Generation  of  System  State  Graph,  G^(2,  ^) , for  (a,  c)  * (2,  4) 


I 

1 


I 


- 65  - 


For  con ven i ence,  however , we  will  denote  the  probability  of  transi- 
tion from  system  state  a(t)  to  its  successor  acceptance  state,  ,g,(a), 

A 3 

by  -vP,{cf)  . s imilarly,  we  will  denote  the  probability  of  transition 
A 3 

from  system  state  o(t)  to  its  successor  rejection  state,  9^.(0),  by  p^(a)  . 
Such  transition  probabilities  imply  that  the  state  of  the  system  at  any 
time  instant,  t,  is  stochastically  determined  by  its  state  at  the  pre- 
ceding time  instant.  The  transition  probabilities  from  a particular 
system  state,  a(t),  to  all  states  in  the  system,  S(a,  c),  must  add  up 
to  1.  Since  the  transitions  from  cj(t)  are  to  its  successor  rejection 


state  and  acceptance  states. 


, Idle  1 Ine  states, 
X In  a(t) 


Theorem  3.2.7  The  probability  of  transition  from  the  system  state 
a(t)  to  its  successor  rejection  system  state,  9^.(0)  = + l)  is 


where  k « total  number  of  busy  line  states  In  a(t); 

j “ total  number  of  elements  in  idle  line  states  in  c(t); 

m “ total  number  of  memory  modules  on  a line;  and 

N ■ total  number  of  memory  modules  in  the  system. 

Proof  A request  is  rejected  by  the  system  if  the  request  addresses 
any  of  the  busy  lines  or  busy  modules  on  idle  lines  at  t.  The  probabil- 
ity of  a request  addressing  a busy  line  at  t is  the  total  number  of 
busy  lines  at  t/total  lines  in  the  system.  The  total  number  of  busy 
lines  at  t ■ total  number  of  busy  line  states  In  0(t)  * k.  Hence  the 
probability  of  a request  addressing  a busy  line  at  t ■ k/2,.  The 


- 66  - 


probability  of  a request  addressing  a busy  module  on  an  idle  line  at  t = 

(the  total  number  of  busy  modules  on  idle  lines  at  t)/(total  number  of 
modules  in  the  system).  The  total  number  of  busy  modules  on  idle  lines 
at  t = total  number  of  nonnul 1 module  states  in  idle  1 ine  states  in  a(t)  = 

(|X(t)  I ) = j. 

X(t)  e o(t)  I X(t)  * idle  line  state 

Hence  the  probability  of  a request  addressing  a busy  module  on  an  idle 

k 

line  at  t = j/N.  Therefore,  Pj,(o)  = T"  Since  N = 5,m,  p^(a)  = 

(mk  + j)/N,  □ 

For  example,  consider  the  computation  of  p^,  ( [ ( 1 ) (2)  ] ) and  p^([(2)(3)l) 
for  (a,  c)  * (2,  4).  For  [(l)(2)],  k » 1 and  j ■*  1,  hence  p^  ([(1)(2)])  * 

(m  + I)/N.  For  [(2) (3)1,  k - 0 and  j » 2,  hence  p^{[{2) (3)3)  = 2/N. 

Evaluation  of  the  probability  of  transition  from  the  system  state 
a(t)  e S(a,  c)  to  its  successor  acceptance  state  is  not  as  straightforward 
as  that  for  the  successor  rejection  state  because  there  are  different 
possible  successor  acceptance  states.  Moreover,  for  regenerative  transi- 
tions, we  recall  that  references  to  more  than  one  idle  line  state  in 
a(t)  e may  result  in  transitions  to  the  same  acceptance  state.  For 

the  (a,  c)  - (2,  4)  example,  [(l)(2)(3)]  makes  a transitions  to  [(l)(2)(3)] 
if  the  request  references  the  idle  line  state,  0.  Similarly,  [(1)(2)(3)] 
can  also  make  a transition  to  [(1)(2)(3)]  if  the  request  references  the 
idle  line  state,  (3) • 

In  general,  a generative  system  state  a(t)  makes  a generative  transi- 
tion to  Oj^(t  + 1)  * * [(1)3  U g^(o),  if  and  only  if  the  request 

references  the  idle  line  state,  X ■ 0 c a(t).  However,  a regenerative 

♦ 

t 


- 67  - 


system  state,  o(t),  makes  a regenerative  transition  to 
aj^(t+l)  = ^ 9^(0) 

if  and  only  if  the  request  references  the  idle  line  state  X = 0 or 
X = (c-1),  both  of  which  may  be  in  o(t). 


Theorem  3-2.8  The  probability  of  transition  from  the  system  state  a(t) 
to  its  successor  acceptance  state  of  the  form  c^(t+l)  = 


P,(o) 


N - 


X^a 


i f X = 0 , 


, otherwise. 


Proof  The  probability  of  transition  from  system  state  a(t)  to 
a^{t  + 1)  ■ [(1)]  U g^(o),  is  the  probability  of  a request  referencing  a 
line  with  a null  line  state  at  t,  which  is  equal  to  (the  total  number  of 
lines  with  null  line  states  at  t)/£.  The  total  number  of  lines  with  null 
line  states  at  t ■ the  total  number  of  lines  with  no  busy  modules  at 
t ■ H - (the  total  number  of  lines  with  busy  modules  at  t)  ■ Jl  - |a|. 

Hence  the  probability  of  referencing  a line  with  a null  line  state  at 
t ■ (£.  - \a\)/!i  » (N  - |a|  m)/N,  since  N = 2.m.  Therefore, 
xPg(o)  ■ (N  - |al  m)/N,  if  transition  is  only  due  to  acceptance  on  a line 
with  no  busy  modules. 

The  probability  of  transition  from  a system  state  a(t)  to  x9g  » 
where  X 0 is  the  probability  of  a request  referencing  an  idle  module 
on  the  idle  line  represented  by  X(t)  e o(t).  The  probability  of  a re- 
quest referencing  an  idle  nnodule  on  that  line  given  that  it  addresses 
that  line  at  t ■ (the  total  number  of  Idle  modules  on  that  line  at  t)/m. 
The  total  number  of  idle  nodules  on  that  line  at  t ■ m - (the  total  number 


- 68  - 


of  busy  modules  on  that  line  at  t)  = m - |X|  • ;^Pg(®)  “ (the  probability 
of  a request  referencing  the  line  at  t)  x (the  probability  of  the 
request  referencing  an  idle  module  on  the  line  at  t).  Therefore, 

m - 1X1 


/ % 1 m - X 

, P (O)  ” TT  ' ' 

X^a^  i m 


Notice  however,  that  for  regenerative  transitions,  transition  from 
a(t)  to  [(1)]  U g^(0)  may  be  due  to  acceptance  on  a line  with  no  busy 
modules  or  on  a line  with  one  busy  module  in  state  (c  - 1).  Hence  both 
transition  probabilities,  ^nd  same  state, 

[(1)]  U 9j.(cf).  Therefore  the  transition  probability  from  a(t)  to  state 
[(1)]  U g^(o)  is  gPg(a)  + » (N  - jo]  m)/N  + (m-l)/N  - 

(N  - (|a|  - l)m  - 1)/N. 

For  the  (a,  c)  * (2,  h)  example,  the  probability  of  transition 
from  [(1)  (2)]  to  [(1)(2)(3)]  is  jjP^  ( [ (1 ) (2)  ] ) - (N  - 2m)/N.  Since 
transition  from  a ■ [(0(2)  (3)]  to  [(1)(2)(3)]  may  be  due  to 
or  ^^jP^(a),  the  transition  probability  from  [(1)(2)(3)]  to  [(l)(2)(3)] 
is  fjP3(t^)  (3)Pa^'^^  ” (N  “ 2m  - 1)/N.  The  probabilities  of  transition 
from  [(1)(2)(3)]  to  [(1,3)  (2)]  and  from  [(1,  3)(2)]  to  [(1,3)  (2)]  are 
each  (m-l)/N.  Recognize  that  in  both  cases,  the  idle  line  state  is  (2). 
Figure  3-2.2  shows  the  system  state  graph,  G^(2,k)  in  which  each  arc 
is  labeled  with  the  corresponding  probability  of  transition. 


- 69  - 


Figure  3.2.2  System  State  Graph,  G^(2,  4),  with  Transition  Probabilities 


70  - 


3.3  State  Reduction  and  Line  Decomposition 

A brief  investigation  of  system  state  graphs  reveals  that  the  num- 
ber of  states  in  a system  state  graph,  for  sufficiently  large  values  of 
a and  m,  increases  dramatically  as  c increases  relative  to  a.  For  example, 
if  the  number  of  system  states  in  S(a,  c)  is  denoted  by  |S(a,  c) | , then 
|S(2,  2) I » 2,  |S(2,  4)|  = 10  and  |s(2,  6) | = 57.  However,  obtaining  a 
formula  for  |S(a,  c) | is  very  complicated. 

The  main  objective  of  the  performance  analysis  is  to  obtain  the 
performance  of  the  system  under  various  parameters. 

Definition  3.3.1  The  steady  state  probability  of  acceptance,  P^(a.  c,  p) , 
is  the  steady  state  probability  that  a request  issued  by  a paral 1 el -pipe- 
lined processor  of  order  (s,  p)  will  be  accepted  by  the  {I,  m)  memory 
configuration  with  module  characteristics,  (a,  c) . 

As  the  size  of  the  system  state  graph  grows  for  a > 1,  the  complexity 
of  computing  c,  p)  for  p * 1 also  grows.  For  p = 1,  c,  p)  is 

the  probability  of  being  in  a certain  set  of  states,  namely,  the  set  of 
acceptance  states  which  is  a subset  of  S(a,  c) . Since  the  objective  is 
to  obtain  P^(a,  c,  1),  a reduction  of  the  system  state  graph  is  possible 
by  collapsing  the  states  which  literally  appear  identical  to  a request 
attempting  to  access  the  system  in  those  states. 

In  this  section  an  attempt  is  made  to  discover  and  eliminate  un- 
necessary states  in  the  state  graph.  It  is  also  shown  that  since  all 
lines  in  the  memory  system  are  Identical  and  independent,  a single  line 


71 


model,  instead  of  the  total  system  model  used  thus  far,  can  be  developed 
to  simplify  analysis  of  the  system  significantly. 

Before  the  system  state  reduction  is  discussed,  certain  relation- 
ships between  line  states  in  the  set  of  line  states  A(a,  c)  will  be  in- 
vestigated. 

Recall  that  a busy  line  may  have  more  than  one  busy  module  on  it. 
Consider  the  case  for  (a,  c)  = (2,  k)  in  which  a request  addresses  a 
busy  line  represented  by  the  busy  line  state  X = (1,  3) • The  request 
will  be  rejected  causing  a transition  from  (1,  3)  to  l^p(l>  3)  = (2). 

The  same  next  state  would  result  if  the  addressed  line  was  represented 
by  (1).  Therefore,  there  may  be  some  cases,  as  demonstrated  above,  in 
which  the  next  line  states  of  two  distinct  busy  line  states  are  identi- 
cal . 

Let  fp(^)  represent  the  nonacceptance  state  of  X arrived  at  by 

successive  application  of  the  line  nonacceptance  function  j times,  for 

an  integer  j > 0.  That  is,  f'^"  (X)  » f (f-^*  '(X)),  where  f ^ (X)  = f (X) 

— n nn  n n 

and  f^(X)  = X. 
n 

Definition  3.3-2  Two  line  states,  Xj  and  X2,  in  A(a,  c)  are  equiva- 
lent, written  Xj  - \2t  if  they  are  identical  or  if  they  have  identical 

smallest  nonnull  element  r . < a,  such  that  f‘*(X,)  = H(X-),  for  j = 

min  n i n 2 

• ^in^*  ° 

This  definition  includes  idle  line  states.  However,  an  idle  line 
^ state  X e A(a,  c)  is  only  equivalent  to  itself,  i.e.,  X ~ X.  Although 

each  line  state  Is  equivalent  to  itself,  it  is  easy  to  test  for  equivalence 

[ 

r 

I 


72  - 


1 

I with  other  line  states  by  applying  the  definition.  However,  only  busy 

^ line  states  need  be  considered  to  find  equivalence  among  distinct  states. 

Moreover,  for  distinct  busy  line  states,  only  pairs  with  identical  smallest 
element  need  be  compared. 

Theorem  3-3.1  if  X.  ...  then  f'(X.)  ~ f ‘ (X.)  . for  i > 0. 

I i n I n 2 — 

Proof:  If  X,  = X.,,  the  proof  is  trivial  since  f'(X,)  = f'(X»). 

I 22  n 1 n 2 

V i > 0.  However,  if  X,  9^  X.,,  for  j = (a  - r . ) > 0,  where  r . is  the 

identical  smallest  element  in  X,  and  X„,  f-*(X,)  = f-*  (X.,)  . We  need  to 

I 2 n I n 2 

show  that  if  X^  ~ "*  3-3-1  illustrates  the 

problem.  Since  X,  ^ X-  and  X,  ~ X.,,  a > r . . From  theorem  3.2.1, 

12  12  min 

I = (x  I X - 1 e X, (t)  and  x < c) . Hence  the  identical  smallest 

i n I ' 1 

element  in  f (X, ) and  f„(X.,)  is  r . + j = r'.  Let  j ' = (a  - r ' . ) 

n I n 2 mm  mm.  mm 

>0.  Note  that  j'  = j - 1.  Since  fJ(X^)  = f^(X2),  ^ ^ = 

f (X,).  Therefore  by  i applications,  X,  ~ X_  •=>  f'(X,)  ~ f'(X-),  V 
n t I 2 n I n 2 

i > 0.  □ 


This  theorem  states  that  if  two  line  states  are  equivalent,  their 

I 

successor  states  are  also  equivalent  and  possibly  Identical.  For  example, 

I 

^ consider  the  two  distinct  line  states  Xj=(l,A)  and  X2=(l,5)  in  A(3,6). 

j “ a - r . - 3 - 1 “ 2 and  f^(l,  = (3)  = 5).  Hence  (1,  4)~ 

I mm  n n 

(1,  5).  Therefore,  f^(l,  4)  ~ f'(l,  5),  for  i > 0.  Hence  (2,  5)  - (2), 

n n — 

(3)  ~ (3)*  (^)  ~ (^) , and  so  forth.  In  common  terms,  two  busy  line  states 
are  equivalent  If  they  differ  only  in  elements  large  enough  to  be  incre- 
mented beyond  c - 1 by  the  time  the  line  is  idle. 

I 

1 


- 73  - 


■fn  (^2  ) 


(X2) 

I 


i t 


(Xi)=fi  (X2) 


1 

fn  ( ^ ) 

i 

fj;  ( X ) 

FP‘5293 


Figure  3.3-1  Illustration  of  Theorem  3-3-1 

I 

I 

I 

I 


- 7^  - 


The  equivalence  relation  defined  above  on  A(a,  c)  partitions 
A(a,  c)  into  a set  of  equivalence  classes,  C^(3,  c) , called  1 i ne  equ i va- 
lence classes.  All  states  in  each  class  are  equivalent  to  each  other 
and  no  state  in  one  class  is  equivalent  to  any  state  in  any  other  class. 
For  example,  the  line  equivalence  classes,  for  (a,  c)  = (2,  k) 

are  { 0} , { ( 1 ) , ( 1 , 3)} , ( (2)}  and  { (3)} • 

Definition  3»3-3  Two  system  states,  Oj  and  in  S(a,  c) , are  equiva- 
lent, written  Oj  ~ if  for  every  X.  e 3 a unique  e such 

that  Xj  - >1^  and  for  every  Xj^  £ 3 a unique  X.  e such  that 


Since  all  the  elements  of  a system  state  are  distinct,  |o^  | = |o2l*  Tor 
example,  in  S(3,  6),  [(1,  4)  (2,  5)(3)]  ~ [(n(2)(3)],  since  (1,  4)  - (1), 
(2,  5)  ~ (2)  and  (3)  ~ (3).  As  with  line  state  equivalence,  system  state 
equivalence  is  an  equivalence  relation  on  S(a,  c)  and  hence  partitions 
S(a,  c)  into  a set  of  equivalence  classes,  c) , called  system  equi va- 

lence classes.  For  the  example  of  S(2,  4),  the  system  equivalence  classes 
are  {[0]},  { [(!)],  [(1,  3) ]} , { [ (2) ] } , { [ ( 1 ) (2) ] , [(1,  3) (2) ]} , { [ (3) ]> , 

{ [(1)(3)]},  { [(2)(3)]}  and  { [ (1 ) (2) (3) ]) . 

In  order  to  determine  ^^j(a,  c) , it  is  only  necessary  to  consider 
pairs  of  system  states  whose  cardinalities  are  identical  and  have  identi- 
cal sets  of  smallest  elements  In  their  line  states.  The  reduced  set  of 
system  states.  S' (a,  c) , represents  the  system  equivalence  classes  for  the 
system  state  graph,  G^(a,  c) . Any  state  in  an  equivalence  class  can  be 


- 75  - 


selected  to  represent  the  set  of  states  In  the  equivalence  class.  How- 
ever, for  consistency,  the  state  with  the  least  number  of  module  states 
in  each  equivalence  class  will  be  selected  as  the  system  state  represent- 
ing that  equivalence  class.  In  the  C^j(2,  example,  [(!)]  would  repre- 
sent {[(1)],  [(1,  3)]}  and  [(l)(2)]  would  represent  {[( 1 ) (2) ] , [(1,  3) (2) ]} . 

With  the  reduced  set  of  states.  S' (a,  c)  , the  reduced  state  graph  can 
be  obtained  by  merging  each  state  not  in  S' (a,  c)  with  its  equivalent 
state  in  S' (a,  c) . By  so  doing,  the  edges  inbound  to  such  a state  not 
in  S' (a,  c)  are  transferred  to  the  state's  equivalent  in  S' (a,  c)  . The 
graph  thus  obtained  is  the  reduced  state  graph,  G^(a,  c).  Notice  that 

if  any  two  states,  o.,  a.  e S(a,  c)  , are  equivalent  such  that  cr.  e S'(a,  c) 

J J 

but  a.  i S' (a,  c) , then  the  transfer  of  edges  to  a.,  which  were  inbound 

J 

to  o.  , implies  that  the  new  transition  probability,  p..,  from  any  system 

K I J 

State  0.  e S' (a,  c)  to  a.  in  G^(a,  c) , is  the  old  transition  probability, 

Pik  if’  G^(a,  c)  . 

The  algorithm  for  obtaining  G^(a,  c)  from  G^(a,  c)  is  summarized 
below. 

Step  1.  Form  c) : Partition  S(a,  c)  into  system  equivalence 

classes . 

Step  2.  Form  S' (a,  c) : In  each  equivalence  class,  select  the  state 

with  the  least  number  of  module  states. 

Step  3-  Draw  all  outgoing  edges  from  each  state  in  S' (a,  c)  as  in 

G^(a,  c)  except  outgoing  edges  that  terminate  at  states  not 
in  S ' (a , c) . 

Step  Redirect  all  incoming  edges  to  states  not  in  S' (a,  c)  to 
their  respective  equivalent  states  in  S' (a,  c)  . 


- 76  - 


Applying  the  algorithm  to  the  example  G^(2,  k)  of  figure  3-2.2, 
which  is  repeated  in  figure  3.3.2(a),  produces  the  reduced  system 
state  graph,  G^{2,  k) , shown  in  figure  3.3.2(b).  From  the  reduced 
state  graph,  G^(2,  ^) , the  steady  state  probability  of  acceptance 
P.(a,  c,  p)  for  p = 1 and  any  (£,  m)  memory  configuration  can  be  obtained 
for  the  module  characteristics  (a,  c)  = (2,  4). 

P«(2,  4,  1)  = ^ ^ 

N + Nm  + 2N  ” m + 1 

where  N = £m.  4,  1)  is  the  sum  of  the  probabilities  of  being 

in  the  acceptance  states  [(l)],  [(l)(2)],  [(1)(3)]  and  [(l)(2)(3)]  in 
G^(2,  4).  The  detailed  computation  of  P^(2,  4,  1)  follows  standard 
Markov  analysis  techniques,  but  is  fairly  complicated  as  shown  in  Append- 
ix A. 

In  general,  the  computation  of  P^(a,  c,  1)  can  be  much  simplified 
if  it  is  observed  that  the  I lines  of  the  memory  system  are  identical 
and  independent.  Since  it  was  assumed  that  requests  are  uniformly  dis- 
tributed among  the  I lines,  a iine  decomposed  model  would  suffice  to 
obtain  the  steady  state  probability  of  acceptance,  c,  p) , for 

p =*  I . A line  decomposed  model  is  the  Markov  model  of  one  line. 

The  line  decomposed  model  is  now  investigated  with  an  aim  toward 
obtaining  a simpler  model  for  obtaining  c,  1).  The  set  of  line 

states,  A(a,  c),  are  the  states  of  the  line  decomposed  model  for  nrodule 
characteristics  (a,  c) . 

A 1 ine  state  graph,  Gjj^(a,  c) , which  is  similar  to  a system  state 
graph,  can  be  used  to  display  the  next  line  state  function.  It  consists 
of  one  node  for  each  line  state  in  A(a,  c)  and  one  edge  leaving  each 


- 77  - 


^ 


Figure  3.3.2(a)  System  State  Graph,  G^(2,  4) 


- 78  - 


Figure  3.3.2(b)  Reduced  System  State  Graph,  G^(2,  4) 


- 79  - 


node  for  each  possible  acceptance  or  nonacceptance  transition.  A 
simple  technique  is  given  below  to  generate  the  line  state  graph  for 
module  characteristics  (a,  c). 

Let  = {0}  and  be  the  set  of  new  line  states  generated  by  a 

one-step  transition  from  the  line  states  in  W , for  1 < n < c-1. 

n- 1 ^ — 

Furthermore,  is  the  set  of  line  states  generated  by  a one-step 

transition  from  line  states  in  set  of  nonacceptance  states 

generated  by  states  in  W , is  R , where  R = {XlX  = X (X,),  X.  e W 

n-1  n*  n ' n i ’ i n-1 

Similarly,  the  set  of  acceptance  states  generated  by  states  in  is 

A , where  A - {X  | X - X,  e W , and  X,  = idle  line  state}, 

n n all  rt"  i • 

R,  = {0}  and  A,  ■ {(!)}.  W * A for  n ■ 1,  and  W ■ A UR,  for 

1 Inn  n n n' 

1 < n ^ c - 1.  Notice  that  R^  is  empty  since  the  state  0 e is  not  new. 

Theorem  3.3.2  of  all  regenerative  line  states  for 

given  (a,  c). 

□ 

This  theorem  is  similar  to  Theorem  3*2.5  whose  proof  is  given.  Hence, 

Wq,  Wj,...,  generate  all  the  line  states  while  simultaneously 

producing  the  generative  transitions  of  the  line  state  graph.  Thus 

A(a,  c)  * U Wj . Recall  that,  for  1 < i < c - 1,  the  largest  element 
i-0  ' “ “ 

in  any  line  state  in  W.  is  i,  hence  transitions  from  line  states  in 
W._j  to  Wj  are  generative.  In  particular,  transition  from  the  line 
state,  0 to  0 is  regenerative.  Transitions  from  line  states  in  W^_j 
to  are  regenerative.  Hence,  the  generation  of  A^  and  R^  produces 
all  the  regenerative  transitions  and  thereby  completes  the  line  state 
diagram. 


- 80  - 


Theorem  3*3.3  W c A(a,  c) 

c - □ 

The  proof  of  this  theorem  is  trivial  since  transitions  from  are 

regenerative. 

As  an  example,  consider  the  formation  of  ^>£(2,  ^)*  Figure  3*3*3 
shows  the  line  state  graph,  ^^(2,  it).  Nodes  marked  are  acceptance 
line  states.  = { 0} , then  Rj  = 0 and  Aj  = {(1)  = f^(0)}.  Wj  = {(1)}. 
R^  = { (2)  = f^(l)}  and  A^  = 0,  hence  = A2  U R^  = {(2)}.  = {(3) 

= f^(2)}  and  A^  » {(1,  3)  = fg(2)},  hence  = {(3),  (I,  3)}.  This 
completes  the  generation  phase.  The  regeneration  phase  is  given  by 
R^  = {0  = fp(3)t  (2)  - 3)}  and  Aj^  = {(1)  = fg(3)}.  The  transition 

probab i 1 i t ies  are  obtained  with  the  aid  of  the  following  theorem. 


Theorem  3*3*it  The  probability  of  transition  from  an  idle  line  state 
X(t)  to  its  successor  acceptance  line  state  is 
m - 1X1 


Pa(X) 


N 


where  |X|  is  the  cardinality  of  the  line  state  X(t). 

Proof : Since  X(t)  is  an  idle  line  state,  a request  which  addresses 

the  line  corresponding  to  X(t)  will  be  accepted  if  the  request  addresses 
an  idle  module  on  the  line  in  question.  The  number  of  idle  modules  on 
the  line  represented  by  X(t)  is  m - |X|,  where  |X|  is  the  number  of  busy 
modules  on  the  line.  Given  a request  to  the  line,  the  conditional  prob- 
ability of  requesting  any  one  of  the  idle  modules  on  the  line  = (m  - 
|X|)/m.  Since  the  probability  of  requesting  the  line  is  l/£,  p (X)  «• 

I . iELLliil-  - \W.  . 

T m N n 


Corollary  The  probability  of  transition  from  an  idle  line 

state,  X(t),  to  its  successor  nonacceptance  state  is 

PnO^)  = > - PgCA).  n 

This  is  obvious  since  p (X)  + p (X)  = 1.  Note  that  nonacceptance  implies 

n 3 

either  rejection  of  a request  to  the  line  or  arrival  of  a request  to  some 
other  line. 

Theorem  3.3.5  The  probability  of  transition  from  a busy  line  state, 
X{t),  to  its  successor  nonacceptance  state  is  Pp(^)  " ^ • □ 

Since  X(t)  is  a busy  line  state,  there  is  no  successor  acceptance  state 
hence  the  result  follows. 

The  example  of  A)  is  redrawn  in  figure  with  the  arcs 

labeled  with  the  corresponding  probability  of  transition. 

Recall  that  the  equivalence  relation  on  A(a,c)  partitions  A(a,c)  into 
disjoint  equivalence  classes.  Hence  in  general,  A(a,  c)  can  be  reduced 
if  there  exists  at  least  two  states  in  A(a,  c)  that  are  equivalent  to 
each  other.  The  reduced  line  state  graph,  G£(a,  c) , can  be  obtained 
from  c)  following  an  algorithm  very  similar  to  that  used  in  ob- 

taining G' (a,  c)  from  G (a,  c) . For  the  example  of  G«(2,  A),  figure 
3.3-5  shows  the  reduced  line  state  graph,  A),  where  equivalent  line 

states  (1,  3)  and  (1)  are  merged. 

From  the  reduced  line  state  graph,  Gj^(2,  A),  the  steady  state  prob- 
ability of  being  in  each  of  the  four  states  can  be  calculated.  However, 


- 85  - 


the  crux  of  the  analysis  is  in  calculating  the  steady  state  probability, 
P^£(a,  c,  p) , of  being  in  an  acceptance  line  state  in  the  line  decomposed 
system  for  p = 1.  In  G^(2,  ^) , the  only  acceptance  line  state  is  (1). 

Let  denote  the  probability  of  being  in  the  line  state,  X.  Hence, 

for  p = 1 , 


Pa£(3,  c,  p)  = 
Then  from  G^(2,  h) , 


acceptance  line  states,  X | X e G^(a,c) 


^0 


^ - 1 p 

2,  ^0 


N - m + 1 „ 

^-li ^3) 


"’(3)  ^ ’’(2)1 2. 


'’(2)  ” '’(I) 

p _ N - m + 1 p 

(3)  “ N ^(2) 


_ N - m + 1 


0) 


" ^1)  " ^2)  " ^3)  = ’ 


3. 

k. 

5. 


r • > j / n £(N"m+l)_  £,(N”m+l)^_ 

From  equations  1 and  h,  P^  = — ^ r. P,  . = 7 F 

0 N (3) 

Substituting  for  P^,  ^(2)’  ^(3)  equation  5, 


(1) 


[1(N  - m + 1 1 + N - m + 1 

N 


N 


^0  ■ ' 


Rearranging, 

&(N  - m + 1)^  ->•  2n2  -t-  (N  - m + 1)N 


^(1) 


lence 


(1) 


(N  - m + 1)  [2,(N  - m + 1)  + N]  + 2N^ 


iince  £m  = N, 
P 


(I) 


(N  - m + 1) (N  + 1)2  + 2N^  (N^  + Nm  + 2N  - m + 1 ) 2 


86  - 


Notice  that  in  the  reduced  line  state  graph,  ^£(2,  ^) , X = (I)  is  the 
only  acceptance  state.  Hence,  in  this  example,  P^j(2,  k,  l)  = 

In  general,  P^^(a,  c,  p)  is  the  probability  of  acceptance  of  a request 
on  the  line  being  modeled. 


Theorem  3-3«6  For  p = 1,  the  steady  state  probability  of  acceptance 
of  a request  in  the  L-M  memory  organization  is 

P^(a,  c,  1)  = £P^^(a,c,l)  ^ 


This  is  obvious  since  all  the  Si  lines  of  the  L-M  memory  organization 
are  identical  and  independent. 

In  the  example  of  (a,  c)  = (2,  k) , P^^(a,c,l)  = Hence, 


P«(2.  1) 


(N^  + Nm  + 2N  - m + l)Jl  N^  + Nm  + 2N  - m + 1 


which  is  identical  to  the  results  obtained  from  the  reduced  system  state 
graph,  G^(2,  4),  but  less  tediously  obtained. 

Hence  in  conclusion,  the  line  decomposition  of  the  system  allows  a 
vast  simplification  of  the  performance  analysis  problem.  Line  decomposi- 
tion will  be  adopted  in  the  rest  of  this  analysis  of  the  L-M  nenorv 
organization. 


3.4  Line  State  Space 

Although  the  number  of  line  states  is  much  less  than  «. 

of  system  states  for  any  module  characteristics  (a.  c . 
gation  of  the  line  state  graph,  G^(a,  c)  , alsc  reve..  • 
of  line  states  in  a line  state  graph  Increases 


- 87  - 


relative  to  a.  The  complexity  of  computing  Py^(a,  c,  1)  Increases  with 
the  size  of  the  line  state  graph. 

In  this  section,  we  will  characterize  the  number  of  line  states 
in  a line  state  graph,  c) . It  is  also  shown  that  the  number  of 

line  states  in  G^Ca,  c)  depends  on  the  maximum  number  of  busy  modules 
on  a line. 

Theorem  3.^»1  The  maximum  number  of  busy  modules  on  a line  for  module 
characteristics  (a,  c)  is 
Proof:  As  usual,  let  the  element  of  a line  state  be  listed  In  ascending 

order.  It  is  easy  to  show  that  the  maximum  number  of  busy  modules  on  a 
line  occurs  when  the  line  is  in  the  acceptance  state  X » (1 , a 1 , 2a  -f 
1,  ...,  ka  + 1),  where  k is  the  greatest  integer  such  that  ka  + 1 ^ c - 
1.  Notice  that  in  this  state  adjacent  elements  have  a difference  of  ex- 
actly a.  It  should  be  pointed  out  that  this  acceptance  state  may  not 
be  the  only  line  state  with  the  maximum  number  of  busy  modules.  The 
maximum  number  of  elements,  and  hence  the  maximum  number  of  busy  modules, 
in  the  acceptance  state  X is  k 1.  We  know  that  for  an  integer,  n,  and 

a real  number,  x,  fx]  ■ n.  If  x ^ n < x + 1 . In  order  to  show  that  this 
c - I 

is  true  for  x ” and  n ■ k + 1 , we  know  that  ka  + 1 ^ c - 1 

ka  < ka  + 1 c - 1 , for  integers  k and  a such  that  k 5^  0 and  a >1  1 • 

c - 1 c - I 

Hence  ka  < c - 1 k < — - — k + I < — - — + I . Since  k is  the  greatest 

Integer  such  that  ka  + 1 <_  c - 1 and  a ^ 1 , then  (k  + l)a  + 1 > c - 1 . 
c - 1 1 

That  is,  k + I > . Let  q and  r be  integers  such  that  0 < r ^ 

a - 1 and  ^ ■ q + — . Hence,  ^ ■ d + (—  + — ) and  0 < — + — < 1 . 

a a^a  ’a  ^'aa  aa  — 


II  ~r-- 


- 88  - 


Since  k Is  the  greatest  integer  such  that  k + 1 > q + — and 

3 

0 < — + — < 1 1 it  impl  ies  that  k+1  >q+— +— . Hence  k + 1 > 


a 

c - 1 


a — 


Therefore  k + I 


Combining  the  above  results. 


a — 


< k + 1 < 


+ 1 


Hence 


|X|  <_ 


for  X e A(a,  c) 


□ 


Theorem  The  maximum  number  of  busy  modules  on  an  idle  line 

for  module  characteristics  (a,  c)  is  . 

Proof;  It  is  trivial  to  show  that  the  theorem  holds  for  a « c ■ 1. 

Assume  that  c _>  a ^ 1 . The  maximum  number  of  busy  modules  on  an  idle 

line  occurs  when  the  idle  line  state  is  X ■ (a,  2a,  3a,...,  ka) , where 

k is  the  greatest  integer  such  that  ka  c - 1.  Such  a k exists  since 

Ic  - 1 


> 1.  Then  (k  + l)a  >c  - 1.  Hence  k 


Hence  |X|  ^ a'  line  state  X e A(a,  c) . Recall  from 

the  last  section  that  the  transition  probability  of  an  idle  line  state 
X(t)  to  its  successor  acceptance  state  is  p (X)  ■ (m  - |X|)/N.  Hence 

~~  )/N  ^ Pg(A)  ^ 1 , for  G£(®f  c)  • 1 1 wl  1 1 be  shown  in  the  next 

section  that  it  is  this  number,  a ^ j * determines  the  classification 

of  the  performance  analysis  of  various  module  characteristics. 


Lemma  3.^.1  A(a,  I - 1 ) A(a,  1) , where  a ^ 1 and  I > a. 


This  lemma  is  obvious  since  I > I - 1 and  the  address  cycles  are  identical. 


- 89  - 


Theorem  3«^«3  The  total  number  of  distinct  line  states  for  module 
characteristics  (a,  c)  with  c >a  is 

N(a,  c)  ■ N(a,  c - l)  + N(a,  c - a), 
where  for  1 c ^ a,  N(a,  c)  - c. 

Proof;  A module  which  accepts  a request  goes  through  all  c - 1 busy 
module  states  sequentially  in  ascending  order.  In  order  to  prove  the 
recursive  relationship,  consider  that  there  are  c - 1 slots  in  which 
an  object  representing  a busy  module  on  a line  can  be  placed.  The 
slots  are  numbered  sequentially  from  1 to  c - 1 . An  object  is  placed 
in  slot  re{l,  2,  ...,  c - 1},  if  a module  on  that  line  is  busy  be- 
cause it  accepted  a request  r STUs  ago.  Hence  N(a,  i)  is  the  number  of 
distinct  ways  of  placing  Identical  objects  in  i - 1 slots  such  that  if 
there  is  an  object  in  slot  tj  and  another  in  r j , then  |r.  - rj [ _>  a. 

Note  that  N(a,  i)  includes  the  case  in  which  no  object  is  placed  in  any 
of  the  c - 1 slots  corresponding  to  all  modules  on  that  line  being  idle. 

The  memory  cycle  determines  how  long  a module  remains  busy  hence 

A(a,  c)  ■ { 0,  (1 ) , (2) (c  - 1 )}  , for  1 ^ c <_  a.  Hence  for 

l^c^a,  N(a,  c)  "c.  A (a,  i - 1}  = A(a,  I),  for  i > a,  hence 
N(a,  i)  ■ N(a,  i - 1)  + the  number  of  states  due  to  the  effect  of  in- 
creasing the  memory  cycle  by  1.  Therefore,  consider  the  effect  of 
adding  an  (i  - l)th  slot  to  the  i - 2 existing  slots.  This  added  slot 
corresponds  to  increasing  the  memory  cycle  from  1 - 1 to  1 . All  new 
combinations  of  placing  identical  objects  In  the  slots  must  contain  an 
object  in  slot  1-1.  However,  the  presence  of  an  object  in  slot  I - 1 
precludes  the  placement  of  an  object  in  any  of  the  previous  a - 1 slots. 


- 90  - 


Hence  the  slots  that  effectively  partake  In  the  placement  of  objects 
due  to  the  addition  of  the  (i  - l)th  slot  are  1,  2,  3,  •••,  I - 1 - a. 
However,  the  number  of  distinct  ways  of  placing  objects  in  I - a - 1 
slots,  subject  to  the  rule  that  no  two  objects  are  less  than  a STD 
apart.  Is  N(a,  1 - a) . Hence,  for  i = c,  N(a,  c)  = N(a,  c - 1)  + 
N(a,c~a).  j-j 

Notice  that  N(1 , c)  Is  a binary  series  in  c and  N(2,  c)  is  the 
Fibonacci  series.  For  properties  of  the  general  series,  see  W(a,  1,  c) 
in  [lo].  It  can  be  easily  shown  that  N(i,  I)  ■ I and  N(I,  i+U»l+l, 
for  i ^ 1.  As  an  Illustration,  consider  the  case  of  (a,  c)  “ (2,  6). 

N(2,  6)  - N(2,  5)  + N(2,  4).  N(2,  5)  » N(2,  4)  + N(2,  3)  and  N(2,  4)  - 

N(2,  3)  + N(2,  2).  Hence  N(2,  6)  - 3N(2,  3)  + 2N(2,  2)  - 3x3  + 2x2  - 

13. 

There  is  no  known  closed  form  solution  for  N(a,  c)  derivable  from 
the  recursive  relation.  However,  N(a,  c)  can  be  rederIved  differently. 

Theorem  3.4.4  The  number  of  distinct  line  states  with  B elements  in 
A(a,  c)  is 

c-(6-l)a-l  JB-l  h h 

n(B)  - Z Z...  2 2j,,8>2. 

Jb.,-1  Jg.2-1  J2-I  J|-l 

n(0)  ■ 1 , n(l)  ■ c - 1 . 

Proof;  It  Is  Immediately  obtained  that  n(0)  ■ 1 (state  0)  and 

n(l)  ■ c ~ 1 (states  (1),  (2),  ...,  (c  - 1)).  Assume  that  the  elements 

of  a line  state  are  listed  In  ascending  order.  Let  rj^  represent  the  kth 


I 

I 


element  of  a line  state  with  3 elements.  Then  r,  . , - r,  > a,  for 

k+1  k — 

0 < k < 3.  The  first  elements,  fp  can  take  a value  between  1 and 
c - (3  - l)a  - 1 inclusi  ve.  Hence  the  second  elements,  T2,  can  take 
a value  between  + a and  c - (3  “ 2) a - 1 Inclusive.  In  general, 
the  kth  element  can  take  a value  between  rj^_j  + a and  c - (3”k)a  - I 
for  1 < k < 3. 


Hence, 

c-(3-l)a-I  c-(3-2)a-l 

n(e)  - Z S 

rj“l  r^^r^+a 


c-a-1  c-1 

E E 

'’3-r'’3-2'^®  '’3“'‘3"l'^® 


Let  j,  - c - a - rg_,.  If  r^.,  - r^^^  + a,  j,  - c - 2a  - r^.^. 
Similarly,  If  r^_^  ■ c - a - 1 , ■ I . With  this  change  of  variable, 

the  Innermost  two  summations  can  be  rewritten  thus: 


c-a-1 

E 


c-2a-r,. 


E (c-a-rg  .)  “ E j.  ■ E 

Jr'*2=''’6-2  Jr' 


In  general,  let  jj  - c - la  - fg_j,  for  2 ^ 1 ^3  ■ 2 and  jg_j  ■ rp 
If  for  2 ^ I ^ 3 - 2,  rg_j  - + a,  J j - c - (I  + 1 )a  - Tg_j_j  - 

jj^p  and  If  rg_j  ■c-Ia-l,Jj»l.  Hence  applying  these  changes 
of  variables  as  in  the  above  example,  to  n(3). 


c-(3-l)a-l  , 

n(3)  - E ® 'e 


Jr-i"’ 


J3-2“' 


Corollary  The  total  number  of  distinct  line  states  In 


A(a,  c)  Is 


N(a,  c)  - E n(3). 
3-0 


- 92  - 


♦ 

I 


I 


This  result  follows  from  Theorems  and  The  following 

corollaries  are  results  for  some  specific  cases,  using  the  formulas. 


n 

E 1 
1-1 


n (n  + 1 ) 
2 


n 

and  E I 
1-1 


2 _ n(n  + 1)  (2n  + 1) 
5 


and  Theorem  3.^.^. 


Corollary  3.^. A. 2 For 


Corollary  3-^.A.3  For 


Corollary  3.^»A.A  For 

^ (c-2a-l)(c-2a)(c-2a-H) 

6 


1 , N(a,  c)  - c. 

2,  N(a.  c)  . c * iSISlliiSUl  . 

3,  K(a.  c)  - c 


□ 

□ 


The  above  corollaries  are  examples  of  closed  form  solutions  of  N(a,  c) 
for  three  different  classes  of  module  characteristics.  It  Is  obvious 
that  the  magnitude  of  N(a,  c)  increases  nonllnearly  with 


3.5  Probability  of  Acceptance,  Py^^(a,  c,  p) 

No  closed  form  solution  for  Py^(a,  c,  p)  exists  for  the  general  module 
characteristics  (a,  c) , even  for  p - I.  We  must  know  the  (a,  c)  values 
and  obtain  the  reduced  line  state  graph,  c) , In  order  to  solve  the 

Markov  state  diagram  for  the  probability  of  acceptance,  c,  p),  for 

p - 1.  The  technique  used  requires  the  computation  of  the  steady  state  ] 

I 

I 


- 93  - 


probability  of  being  in  an  acceptance  line  state.  This  method  is  not 
readily  applicable  to  the  general  case  for  p ^ I. 

In  this  section,  the  steady  state  probability  of  acceptance, 

P^(a,  c,  p)  for  p ^ 1,  is  discussed  for  certain  classes  of  module  charac- 
teristics. Some  of  the  results  discussed  here  were  presented  in  [24]. 

It  is  also  shown  that  the  complexity  of  Py^(a,  c,  p)  increases  with  the 
maximum  number  of  busy  modules  on  an  idle  line. 

However,  closed  form  expressions  exist  for  c»  P)  In  the  following 

cases:  c ^ a ■ 1 and  a ^ c 2a.  These  classes  of  module  characteristics 

cover  more  than  78^  of  the  possible  cases  of  module  characteristics,  if 
c < 10. 

In  Chapter  2,  It  was  shown  that  a request  can  be  rejected  due  to 
three  types  of  memory  collisions.  For  convenience,  the  three  types  of 
memory  collisions  are  repeated  here.  A request  made  to  the  memory 
system  may  be  rejected  due  to 


1. 

MALC: 

Multiple  Access  Line 

Col  1 is  ion 

(only  if 

2. 

BLC: 

Busy  Line  Col  1 is  ion 

(only  if  a 

> 1),  or 

3. 

BMC: 

Busy  Module  Collision  (only  if 

c > a) . 

Recall  that  p requests  are  Issued  simultaneously.  Hence  If  more  than 
one  request  addresses  the  same  line,  only  one  of  these  may  be  accepted. 
Let  Pp  and  P^  be  the  probabilities  of  rejection  of  a request  due 
to  MALC,  BLC  and  BMC  respectively.  Then  the  probability  of  a request 
being  rejected  by  the  memory  system  can  be  obtained  by  considering  all 
the  mutually  exclusive  and  independent  rejection  and  nonrejection 


events. 


- Sk  - 


Theorem  3.5.1  The  probability  of  rejection  of  a request  issued  to 
the  memory  system  whose  module  characteristics  are  (a,  c)  is 

P^(a,  c,  p)  - P,  + (1  - P,)P2  + p - P,)(l  - P2)P3-  □ 

Notice  that  1 - Pj  and  1 - P2  are  the  probab i 1 i ties  that  a rejection  does 
not  occur  (non  reject  I on)  due  to  MALC  and  BLC  respectively. 

Corol lary  3, 5. 1 . 1 The  probability  of  acceptance  of  a request  made 
to  the  memory  whose  module  characteristics  are  (a,  c)  is 

P^(a,  c,  p)  - 1 - P^(a,  c,  p).  □ 

For  brevity,  P;^(a,  c,  p)  and  P|^(a,  c,  p)  will  sometimes  be  written  as 
P^  and  Pp^  respectively.  Hence  in  order  to  obtain  P^(a,  c,  p) , it  is 
sufficient  to  know  P^  P2  and  P3.  Notice  that  P^^(a,  c,  p)  ■ (1  - P^ 

(1  - P2)(l  - P3). 

Lemma  3.5.1  The  probability  of  a request  being  rejected  due  to  a 
multiple  access  line  collision  (MALC)  with  one  or  more  of  the  p - 1 
other  simultaneous  requests  Is 

(if)”)  i 

Proof;  There  are  iP  and  (H  - l)*’  distinct  ways  of  mapping  p requests 
to  S,  and  I - 1 lines  respectively.  Hence,  there  are  [£^  - (Jl  - i)^] 
maps  which  reference  a particular  line  at  least  once.  Thus  the  expected 
number  of  distinct  lines  referenced  by  p memory  requests;  i.e.,  the 
Hit,  bandwidth  Is  1'^'’  ~ The  probability,  P,  - I - 

O’  ' " n 


- 95  - 


It  Is  Interesting  to  note  that  ^ 5s  a closed  form  repre- 

sentation of  Ravi's  results  in  [18].  Chang  showed  the  equivalence  of 
I - Pj  and  Ravi's  result  In  [25]. 

A request  will  be  rejected  due  to  BLC  If  it  has  no  MALC,  but 
references  a busy  line. 

Lemma  3- 5.2  The  probability  of  a request  referencing  a busy  line  is 


Proof:  The  expected  number  of  busy  lines  which  can  cause  rejection 

of  an  incoming  request  is  equal  to  the  expected  number  of  accepted  re- 
quests in  the  most  recent  a - 1 STUs  ■ p(a  - 
Hence  p(a-l)P- 

o _ ^ r 


Coroi lary  3 •5.2. 1 The  expected  number  of  idle  lines  is 


^idle  “ ^ ” expected  number  of  busy  lines,  hence 


'idle 


'idle 


i - p(a  - I)P^,  and 

id  - PJ. 


The  computation  of  the  probability  of  referencing  a busy  module  on 
an  Idle  line,  Pj,  is  not  always  straightforward.  However,  P^  can  be 
generalized  by  the  next  lemma. 

Lemma  3.5.3  The  probability  of  referencing  a busy  module  on  an  Idle 


line  is 


- 96  - 


p = e(bh/il) 

^3  E(M/IL)  ’ 

where  E(BM/IL)  is  the  expected  number  of  busy  modules  on  idle  lines  and 
E(M/IL)  is  the  expected  number  of  modules  on  idle  lines. 

□ 

This  formula  can  be  easily  derived  from  simple  probability  theory.  Notice 
that  E(M/IL)  » " ^2^' 

The  derivation  of  E(BM/IL)  for  given  (a,  c)  can  be  made  from  the 
reduced  line  state  graphs,  G^(a,  c) . Notice  that  previously,  the  re- 
duced line  state  graph  was  used  to  compute  P^(a»  c»  p)»  for  p = i.  How- 
ever, E(BM/IL)  Involves  the  general  case  for  p ^ 1.  Since  all  Si  lines 
are  identical  and  independent,  the  reduced  line  state  graph  for  all  Z 
lines  are  identical  and  independent.  Hence  the  transition  probability 
between  any  two  states  in  one  line  state  graph  is  identical  to  the  transi- 
tion probability  between  the  same  two  states  in  another  line  state  graph 
of  the  same  module  characteristics,  (a,  c) . At  steady  state,  if  p re- 
quests are  issued,  pP^  requests  are  accepted.  These  accepted  requests 
cause  the  addressed  lines  to  make  transitions  to  acceptance  line  states. 
Since  the  request  references  are  uniformly  distributed  over  the  i lines, 
the  accepted  requests  will  be  uniformly  distributed  over  all  Z lines 
and  hence  over  all  acceptance  line  states  in  all  Z line  state  graphs. 

For  the  whole  system,  let  us  collapse  all  Z line  state  graphs  into 
one.  The  resulting  state  graph  is  identical  to  a line  state  graph 
and  will  therefore  be  referred  to  as  a line  state  graph.  The  following 
definitions  are  made  to  aid  in  the  derivation  of  E(BM/IL)  for  the  whole 


system. 


- 37  - 


I 


I 

I 

I 


Definition  3«5.1  For  the  whole  system,  E (X)  is  the  expected  number 
of  lines  at  any  time  instant  which  make  transitions  into  line  state 
X e G'(a,  c). 

^ □ 


Definition  3«5.2  (a,  c) 

G^(a,  c) . Hence  A'^  (a,  c)  * 
X ^ t} 


is  the  set  of  nonnull  idle  line  states  in 
{X  ( X e c)»  if  r e X then  r ^ a, 

□ 


Lemma  3.5.^  21:  E(X)  - pP. 

X e Gj^(a,  c)  and 

X ■ acceptance  state 


□ 


This  lemma  is  obvious  since  accepted  requests  cause  transitions  of  the 
addressed  lines  into  acceptance  line  states. 


Lemma  3.5.5 


E(BM/IL)  - ^7-  * 1^0 • 


e A'l  (a,c) 


□ 


This  lemma  follows  from  the  definition  of  E(BM/IL).  The  expected  number 

of  busy  tTKxlules  on  Idle  lines  is  the  expected  number  of  busy  modules 

on  lines  which  make  transitions  to  nonnull  idle  line  states. 

For  the  case  a ■ 1,  some  further  results  are  easily  obtained.  In 

this  case,  the  busy  modules  are  uniformly  distributed  over  the  H lines 

and  no  lines  are  busy.  Hence  il...  * £ at  all  times. 

Idle 


- 98  - 


Lemma  3- 5*6  For  a = 1,  the  probability  of  a request  referencing  a 
busy  module  is 


Proof:  Since  the  busy  modules  are  uniformly  distributed  on  the 

lines  for  a = 1,  the  expected  number  of  busy  modules  which  can  cause  re- 
jection of  an  incoming  request  is  equal  to  the  expected  number  of  accepted 


requests  in  the  most  recent  c - 1 STUs  * p(c-  OP^*  Since  there  are  a 
total  of  N memory  modules  in  the  system, 

D _ P(c  - 1)Pa 


For  the  special  case  a = c » 1,  P2  = P^  = 0.  Hence  P|^(a>  c*  p)  = 

Pj  and  P^(a,  c,  p)  » 1 - P^  = [l  - This  is  the  only  case  con- 

sidered by  previous  investigators  [19,  20,  21,  22].  However,  for  the 
case  c ^a  = 1,  the  probability  of  acceptance  can  be  obtained  from  the 
following  theorem. 


Theorem  3-5-2  For  a ■ 1,  c ^ 1,  the  probability  of  acceptance,  P^(a,  c,  p) 
of  any  (A,  m)  memory  configuration  is 


P^O,  c,  p) 


y-  Pj 

1 + (1-P,)k2 


where  k. 


)(c  - 1) 
N 


Proof:  To  prove  this,  note  that  since  a ••  I , there  will  be  no  rejections 

due  to  busy  lines.  The  probability  of  rejection,  P^^,  of  the  (!■,  m)  mem- 
ory organization  with  nnodule  characteristics  (a  ■ I , c > I ) can  now 


A' 


- 99  - 


be  easily  obtained  since  rejection  can  only  occur  due  to  a multiple 
access  line  collision  or  a busy  module  request: 

- 1 - - p,  * (1  - p,)  • P3 


and  1 - + (1  - P^) 


p(c  - I)P, 


Hence  P^(l,  c.  p)  - , -'pp:k^>  '‘2  ‘ 


□ 


For  the  special  case  of  p = 1 , a * 1 , P^  = 0,  and  P^  (1,  c,  1) 


1 


N 


y ^1^2  “ N c P “ performance  of  a memory 

with  a = 1 is  a function  of  N(=  Jim)  and  c only  and  not  of  the  (Jl,  m) 
memory  configuration. 

Notice  that  the  line  state  graph  was  not  required  to  obtain  P^ 
for  a = J.  For  a “ 1,  the  lines  are  always  idle  at  any  time  instant 
and  are  therefore  ready  to  accept  a request  provided  the  request  is  not 
made  to  a busy  module  on  the  line.  This  simplifies  the  problems. 

Computing  P^  for  a > 1 is  not  an  easy  task  since  the  busy  modules 
are  not  uniformly  distributed  on  the  il  lines,  although  the  requests  are 
uniformly  distributed  over  the  £ lines.  However,  P^  can  be  calculated 
for  certain  classes  of  (a,  c)  by  classifying  the  line  state  graphs.  The 
transition  probabilities  in  a line  state  graph  are  either  (m  - |Xj)N, 

1 - (m  - |X|)/N  or  1.  For  an  idle  line  state,  Xe  Aj{a,  c) , |X|  £ |j~a~^ 
hence  the  lower  limit  of  the  probability  of  transition  from  an  idle  line 
state,  X 0,  to  its  successor  acceptance  state  is  (m  - 
Hence,  for  a > 1,  the  probability  of  transition  from  idle  line  state,  X, 
to  fg(X)  in  G^(a,  c)  Is  Oj  ■ (m  - j)/N  where  j ■ |X|  and 


J 


100  - 


® j 1.  'g'  ^ • Consider  the 


case  where 


c - 1 


Theorem  3. 5. 3 For  a ^ c 2a,  a > 1,  the  probability  of  acceptance, 
P^(a,  c,  p)  of  any  (£,  m)  memory  configuration  is  given  by 


P^(a,  c,  p) 


1 - Pi 


•(a  - 1) 


and  a. 


m - 1 


Also,  P,(a,  c,  p)  - — “l  ■ “•  '‘2  ■ • 

C “*  1 

Proof:  For  a < c < 2a  and  a > 1,  — : — = 1.  Hence  there  is  exactly 

d 

one  busy  module  on  a nonnull  idle  line  state  in  A(a,  c) . Figure  3.5.1 
shows  the  line  state  graph,  C^(a,  c)  , for  a < c ^ 2a,  a>l.  It  can 
easily  be  seen  that  each  busy  line  state,  X,  with  two  elements  is  equiva- 
lent to  a busy  line  state  with  one  element  r,  such  that  r is  the  smallest 
element  in  X.  For  example,  (1,  a + 1)  - (1,  a + 2)  - (1,  a + 3)  ~ ••• 

~ (1 , c - 1 ) ~ (1 ) . Hence  such  states  can  be  merged  which  results  in 
the  reduced  line  state  graph  c)  for  a < c ^ 2a,  a > 1 shown  in 

figure  3*5.2.  The  probability  of  transition  from  a nonnull  idle  line 
state  to  the  acceptance  state,  (1),  Is  Oj  ■ (m  - l)/N. 

The  expected  number  of  busy  modules  on  idle  lines,  E(BM/IL),  can 
be  obtained  from  G£(a»  c) . The  expected  number  of  accepted  requests 
every  STU  is  pP^.  Let  E(X)  denote  the  expected  number  of  lines  which 
make  transitions  into  line  state  X.  Hence  E(l)  ■ pP., 


103  - 


since  X = (1)  Is  the  only  acceptance  state  in  G^(a,  c) . Let 
denote  the  steady  state  probability  of  being  in  state  X.  From  G^(a,  c) 
in  figure  3-5-2,  ■ P^^j  “ •••  “ ^(a)  ’ transition  prob- 

ability from  state  (r)  to  (r  +1)  is  1 for  1 ^ r a - 1-  Hence 
E(l)  = E(2)  - ...  - E(a)  - pP^. 

The  states  that  represent  busy  modules  on  idle  lines  are  (a),  (a  + 1 ) , ..., 
(c  - 2)  and  (c  - I),  each  of  which  has  one  busy  nxidule.  Hence 
E(BM/IL)  -^I^E(i). 

Note  that  X ■ 0 does  not  represent  a busy  module  on  an  idle  line,  hence 
it  is  not  included  In  E(BM/IL).  From  the  nonacceptance  transition  of 
nonnull  idle  line  states, 

E(a+1)  - (l-Cj)  E(a),  E{a+2)  - (l-o,)  E(a+1),  ... 

...  E(c-l)  - 0-a,)  E(c-2). 

Hence  in  general , E(r)  - (1  “ E(r  - 1 ) , for  a + 1 ^ r ^ c - 1 . 

Thus,  E(r)  ■ (1  - ® E(a) , for  a ^ ^ c - 1 . Therefore 

E(BM/IL)  - (1-a,)'"®  E(a). 


Since  E(a)  ■ pP., 


E(BM/IL)  - pP^  (l-a,)’"®  - pP^  ‘'T’  (1-a,)' 
pP- 

- [l-d-a,)*^'®],  a,  > 0. 

From  previous  discussion,  E(M/IL)  ■ N(I  - P2) . Hence 


pP^tl-d-a,)*^”®] 

^3  “ No, (I-P2) 


, > 0. 


(»-p,)^4-r;rpp.n 

Therefore  P^  - 1-P^  - P,  + (1-P,)P2  + 


d-a,)‘=’^] 


- 104  - 


Substituting  for  P, 


p(a-l)P, 


and  manipulating. 


(1-P.)  p(a-l)  (1-P,)  p[l-(l-a,)‘^  n 

'-'“i  ■ V * — T * — Se; 

where  a,  “ ^ > 0.  Letting  and  solving  for  P^,  the 


1 N 

theorem  Is  proved  for  > 0. 


However,  when  - 0,  m ■ 1 and  il  - N,  figure  3.5*3  shows  the 


resulting  reduced  line  state  graph.  In  this  case,  it  Is  easy  to  show 
p(c  - a 
p(c-a)P, 


that  E(BM/IL)  ■ p(c  - a)P^.  Hence, 


*3  ' Nd-P^)  » “i  “ 0 • 


I - P 


Substituting  for  and  P^,  P^(a»  c»  P)  " ^ ~ where 

, , p(c  - Ij, 


'2  N 

For  a “ c,  P^  ■ 0,  hence  P^(at  c,  p)  ■ ■(r--p-,TkV  » 


(1  - p,) 


derivable  from  the  two  results  above  for  > 0 and  Oj  ■ 0 by  substituting 
a » c in  the  equations. 


It  Is  seen  that  the  solution  of  P^^(af  c,  p)  for  a c ^ 2a,  a > 1 
is  fairly  Involved.  There  is  no  known  general  solution  of  P^(a,  c,  p) 
for  general  module  characteristics.  The  next  higher  class  of  module 
characteristics  presents  a more  complex  state  graph.  This  is  the 
case  for  ■ 2,  where  2a  < c ^ 3a  and  a > 1.  An  exact  solution 

for  Py^(a,  c,  p)  was  not  obtained  because  of  the  complexity  Involved. 
Instead  an  approximate  solution  for  large  m was  favored,  although  this 
is  still  relatively  complex.  The  complexity  arises  In  obtaining  Py 
Notice  that  for  large  m,  * “2* 


*»■■!  IIP—  ■■  I * 


Hence  assume  that  - 02  ■ a. 


106  - 


Theorem  3«5«^  If  s ct^,  and  (a,  c)  is  such  that  2a  < c ^ 3a, 
a > 1 , 

I - P, 

P^(a,  c,  p)  - p(l-P  ) » 

' ~Ha  k,] 

/ 

for  a 0,  where  ■ P-— -"■  and  k2  ” 1 “ [1  + a(c-a)]  (l-cc)^ 

Proof:  Figure  3*5.^  illustrates  the  reduced  state  graph,  G^(a,  c) 

for  2a  < c £ 3a.  An  example  is  shown  for  the  (a,  c)  * (2,  6)  case  in 
figure  3.5.5.  In  the  following  analysis,  and  02  are  replaced  by  a. 
Applying  lemma  3.5.4  to  G|(a»  c)  of  figure  3.5.4, 
c-2a-l 

E(l)  + r E(l,  a+i+1)  - pP. 1 

1-0 


From  an  investigation  of  the  idle  line  states  representing  the  busy  modules 
on  Idle  lines  in  G^(a,  c)  and  applying  lemma  3.5.5, 
c-a-1  c-2a-l  c-2a-l-l 

E(BM/IL)  - Z E(a+i)  + 2Z  Z E(a+j,  2a+j+i).  ...  2. 

l-O  i-0  j»0 


But  E(a+j,  2a+J+i)  - (I-a)-*  E(a,  2a+i)  and  E(a,  2a+i) 

n-1  1 1 _ 

hence  substituting  and  using  the  relation  .Z^r  - -y- 


« E(1 , a+i+1) , 

lH 

r’ 


c-a-1  c-2a-l  , /,  vC-2a-i 

E(BM/IL)  - Z E(a+i)  + 2Z  - E(l,  a+i+1).  . 

l-O  l-O  “ 

Considering  the  single  element  Idle  line  states  in  the  state  graph, 
E(a+1+1)  - (l-a)  E(a+i)  + (l-a)  E(a+i , c-l),  for 

0 < 1 < c-2a-l 

and  E(a+I+1)  - (l-a) ^ E(c-a),  for  c-2a-l  < 1 < c-a-2. 

This  can  be  rewritten  as 

E(c-a+i)  - (l-a)’  E(c-a),  0 < 1 <a-l. 


3. 


4 


109  - 


Hence 

c-a-2  a-1  a-1  . 

E E(a+1+1)  = E E(c-a+l)  =E  (l-a)  E(c-a) 
l»c-2a-l  1=0  1=0 

c*a*"2  I /|  \a 

E E(a+I+1)  = - E(c-a) 5. 

l=c-2a-l 

From  the  nonacceptance  of  two-element  states, 

E(a+i,  c-I)  - (l-a)’  E(l,  c-a-l),  0 < 1 < c-2a-l 6. 

Substituting  for  E(a+i,  c-1)  In  equation  k, 

E(a+r+l)  = (l-a)[E(a+r)  + (I-a)’  E(l,  c-a-f)] 7. 

It  can  be  shown  from  equation  7 that  in  general, 

. c-2a-I 

E(a+i)  = (l-o)  [E(a)  + E E(l,  a+j+1)],  1 ^ i ^ c-2a  ...  8. 

j“c-2a-i 

Considering  the  transitions  to  the  two-element  acceptance  states,  for 
0^1^  c-2a-1 , 

c-2a-i-l 

E(l,  a+i+1)  = a[E(a+i)  + E E(a+i,  2a+i+j)] 9. 

j-0 

But,  E(a+i,  2a+f+j)  - (1-a)’  E(a,  2a+j)  - (l-a)’  E(l,  a+j+l). 


hence,  substituting  for  E(a+i,  2a+i+j)  in  equation  9, 

c-2a-i-l  j 

E(l,  a+1+1)  = a[E(a+i)  + E (1-a)'  E(l,  a+J+D] 10. 

j-0 

Substituting  for  E(a+l)  from  equation  8, 

, c-2a-l 

E(l,  a+l+1)  - a(l-a) ' [E(a)  + E E(l,  a+j+1)] 11. 

j-0 

Since  E(a)  - E(l)  and  from  equation  1, 

E(l,  a+i+1)  - a(l-a)  ’ pP^^,  0 1*1  c-2a-} 12. 


- no  - 


The  first  term  of  equation  3 can  be  rewritten  as 

c-a-I  c-2a-I  c-a-i 

T E(a+i)  «=  I E(a+i)  + E E(a+i) I3. 

i“0  i=0  i=c-2a 

From  equation  8, 

c-2a-l 

E(c-a)  = (1-a)^  ® [E(a)  + E E(l,  a+j+1 ) ] 

j“0 

Recognizing  the  bracketed  term  as  pP^. 

E(c-a)  = (1-a)^  ^ pP^ 14. 

From  equations  1 and  12, 

c-2a-I  , 

E(I)  + OpP  E (l-a)‘  = pP. 
i-0 

Since  E(l)  * E(a),  evaluating  the  above  expression, 

E(l)  - E(a)  - (l-a)*^  pP^ I5. 

From  equations  8,  12,  and  15, 

E(a+i)  » (1-a)*  [(l-a)*^  pP.  + E a(i-a)-*  pPj 

j»c-2a-i 

E(a+i)  - (1-a)’  pP^ 

E(a+i)  - pP^,  0 < i < c-2a 16. 

Combining  equations  5,  lA,  and  16,  equation  13  become 

E(a+i)  - ^ ^E  (l-a)*^'^®  pP  + - ~ • (l-a)*^’®  pP 

1-0  AO  A 


E(a+i)  - [(c-2a)  + (i-a)'"’^^  pP^ I7. 


c-a-1 

E 

i-0 


c-a-1 

E 

1-0 


in 


Substituting  for  E(l,  a+i+1)  from  equation  12  into  last  term  of  equa- 
tion 3, 


c-2a-l  , /,  vC-2a-I  . c-2a-l  , c-2a-l  , 

22:  ~ d(l-a)'  pP.  - 2pP.[  Z (1-a)'  - Z (l-a)'"'^®] 

1-0  a H H j_Q 


*2[. 


1 - (la) 


c-2a 


a 


(c-2a)  (l-a)*'’^^]  pP, 


18. 


Combining  equations  17  and  18, 
E(BM/IL) 

Therefore, 

iC-a 


2[1  - (l-a)^'^^l  ->-(ri-(l-a)^]  - a(c-2a)Kl-a)* 

a 


pP, 


iC-2a 


IS. 


Hence  similar  to  the  proof  of  Theorem  3.5.3» 

1 - P 

P;^(a,  C,  p)  - pQTp") » for  a 0 

1 + (1-P,)  k,  -t-  [1  - (l-a)*^"^  + k2l 

where  k,  - and  k_  - 1 - [1  + a(c-2a) ] (1-a)*' 

I Jt  z r-l 


For  a - 0,  a good  approximation  Is  the  result  of  Theorem  3-5.3  for 
- 0,  especially  for  large  N. 

In  general,  the  complexity  of  the  analysis  increases  with 


3.6  Bounds  on  P^(ai  c,  p) 

It  was  seen  In  the  last  section  that  obtaining  P^(a»  c,  p)  for 
general  module  characteristics  is  a formidable  problem.  However,  upper 


112  - 


f ' 


I 

\ 

I 


and  lower  bounds  can  be  obtained  for  P^(a.  c,  p)  which  give  a rough  esti- 
mate for  design  purposes. 


Theorem  3»6.1  The  maximum  performance  memory  configuration  for  any 

(a,  c)  is  (2.,  m)  * (N,  1)  and  for  this  configuration, 

r,  / \ ' ^1  , p(c-l) 

P)  ” — (l-'P~Ik»  = N • 

Proof:  It  Is  trivial  to  show,  since  increasing  I cannot  decrease  per- 

formance, that  the  maximum  performance  memory  configuration  is  (2,  m)  “ 
(N,  1).  In  this  case,  since  2.  *•  N,  we  can  use  the  state  transition  dia- 
gram of  figure  3.5.3  to  evaluate  E(BM/IL).  Figure  3-5.3  is  repeated  in 
figure  3-6.1  for  convenience.  Notice  that  when  2 * N,  m = 1 and  a line 

cannot  accept  a request  unless  the  module  on  that  line  is  idle.  From 

C"  1 

figure  3-6.1,  E(BM/IL)  E(i)  = (c-a)  E(a),  since  E(i)  * E(i+l)  for 

1 ^ i c-2.  pP^  = E(l)  * ...  * E(a).  Therefore,  E(BM/IL)  = p(c-a)P^. 
Hence, 


p(c-a)P^ 

^3  “ nTFpP"  ' 

Using  Theorem  3.5.1  and  substituting  for 


P(a-1 )P^ 
2 


and  2 


N, 


' ■ ’’i 

% ■ I vfi-p7n^  '' 


p(c-l ) 
N 


□ 


I 

I 

I 


Notice  that  this  result  is  independent  of  a.  It  Is  intuitively 
obvious  that  P^(a»  c,  p)  is  a monoton  leal ly  nondecreasing  function  in  2. 
Although  (2,  m)  ■ (N,  I)  is  the  maximum  performance  configuration,  this 
configuration  is  often  undesirable  for  l^g'e  N because  of  the  cost  in- 
curred by  Increasing  2.  Bounds  on  Py^(a,  c*,  p)  for  the  general  memory 
configuration  will  be  considered. 


I 

I 

I 

I 

f 


Uk  - 


Theorem  3»6.2  For  any  (a,  c)  and  p > 1, 


Pa  (a*  c.  p)  > -4-^ 
where  A = 1 + (1 -Pj ) (kj+l<2) , 


p(a-l ) 
£ 


and  k^ 


p(c-a) 

N 


Proof:  It  is  easy  to  show  that  at  any  time  instant,  the  expected 

number  of  busy  modules  is  p(c  - 1)P^-  Of  these,  p(a  - 1)P^  are  clearly 

on  busy  lines  and  of  the  remaining  p(*  - a)P^  busy  modules,  let  B be 

on  busy  lines,  where  6^0.  Thus  there  are  p(c  “a)P^  “ 6 busy  modules 

on  idle  lines.  Since  P^  = (number  of  busy  modules  on  idle  1 ines)/(number 

of  modules  on  idle  1 ines) , P,  * §.  where  =£(1  "P,), 

3 £,j,g  . m Idle  2 

from  corollary  3. 5. 2.1.  Using  the  expression  P^  and  Theorem  3.5.1. 

1-P  + (1-P.)B/N 
P^(a,  c,p) j • 

1 - P 

Therefore,  P^(a,  c,  p)  ^ ^ . CD 


Similarly,  an  upper  bound  can  be  obtained  for  any  (a,  c)  and  p ^ 1. 


1 - P, 

Theorem  3.6.3  For  any  (a,  c)  and  p ^ 1,  P^(a,  c,  p)  ^ ^ . 

where  kj  «>  ^ ^ ^ . 

Proof:  From  the  proof  of  Theorem  3.6.2, 

l-P,  + (1-P,)B/N 
Pa  (a,  " I + (1-P,) (k^+k2) 

where  0 <_  B < »,  k^  - ^ --  and  k^  = P-^^~  — . The  maximum  P^(a,  c,  p) 

occurs  when  N ■ ».  In  this  case, 


- 115  - 


1 - P, 

Pa  (a*  “ 1 + (l-Pj)kj  • 


Therefore, 


P.(a,  c,  p)  < 


1 - P, 


- 1 + (1-P,)k, 


1-P 


1 - P 


Hence  for  any  (a,  c)  and  p ^ 1,  P)  1 ()-p  )k 

Thus  for  very  large  N,  the  effect  of  referencing  a busy  module 
on  an  Idle  line  Is  negligible. 

Let  us  denote  the  upper  and  lower  bounds  of  P^(Sf  c,  p)  by  P^^(U) 
and  P.(L)  respectively.  Hence, 


Pa(u) 


’ - '’l 

1 + (1-P,)k, 


1 - P 


“ 1 + (l-P,)(k,+k2) 


where  kj  « 


Hence 


p(c  - a) 


I 


I 


P^‘  ''2' 


— 


- kjP,(U). 


Therefore, 


P,(u)  - Pj(L) 

— p^rn 


X 1002  - look,  P,(U)  - 02 


Since  P.(U)  < 1,  D < 100k,  - ^ which  Is  Independent  of  the  con- 

figuration.  Notice  that  for  small  £,  P^(U)  « 1,  hence  D « 100k2.  For 
example.  If  (a,  c)  ■ (3»  8)*  P “ 8 and  N “ 102*f,  then  D < k%.  Since 
Since  kj  ■ ~ D will  be  very  small  for  large  N.  Hence  for  large 

N,  the  lower  bound  may  serve  as  a good  estimation  of  P^(a,  c,  p) . 

However,  another  useful  upper  bound  will  be  presented. 


116  - 


£ 

Theorem  3-6.4  For  any  (a,  c)  and  p > 1,  P,(a,  c,  p)  < min  (1,  — ) - 
Proof ; In  a successive  STUs,  the  expected  number  of  accepted  re- 
quests Is  apPy^-  Since  these  accepted  requests  must  refer  to  distinct 

£ 

lines,  apP.  < Z.  Thus  P.  ^ — . Since  P.  f.  1 f the  theorem  follows.  □ 

This  upper  bound  will  be  useful  in  evaluating  the  effectiveness  of 
buffering  the  requests  which  will  be  discussed  in  the  next  two  chapters. 


- 117  - 


h.  SIMULATION  OF  BUFFERED  AND  NONBUFFERED  REQUESTS 


A.l  Introduction 


In  the  previous  chapter,  it  was  assumed  that  rejected  requests 
were  discarded  so  that  the  independence  and  randomness  over  a uniform 
distribution  assumption  could  be  Justified.  Despite  these  assumptions, 
it  was  seen  that  the  complexity  of  the  Markov  model  increased  with 


c-l 
_ a _ 


and  it  become  very  difficult  to  obtain  analytic  results  for  classes 


of  (a,  c)  such  that  ^ 2. 

In  this  chapter,  two  practical  cases  of  request  schemes  will  be 
Investigated.  In  practice,  rejected  requests  are  not  discarded.  Hence 
one  case  to  be  investigated  is  th?  nonbuffered  request  processor  (NRP) 
system,  in  which  rejected  requests  are  not  discarded  but  resubmitted 
later.  The  other  case  is  the  buffered  request  processor  (BRP)  system. 
in  which  requests  are  buffered  before  being  selected  for  service.  Such 
practical  cases  are  very  difficult  to  model  analytically,  hence  no 
analytical  results  have  been  obtained.  However,  experimental  modeling 
to  evaluate  the  effectiveness  of  buffering  requests  and  resubmission  of 
rejected  requests  on  performance  are  investigated  in  this  chapter. 

Simulation  of  the  NRP  and  BRP  systems  will  be  used  to  obtain  ex- 
perimental results.  The  effect  of  the  following  parameters  on  per- 
formance will  be  Investigated. 

(a)  Read  and  write  relative  module  characteristics  (a^,  c^)  and 
(a^,  cj  respectively 

(b)  Memory  configuration  (i,  m) 


118  - 


(c)  Processor  order  (s,  p) 

(d)  Menxjry  size,  N 

There  are  two  kinds  of  memory  requests  issued  namely,  read  and 
write  requests.  A read  request  takes  a^.  and  segment  time  units  to 
complete  its  address  and  memory  cycles,  respectively.  Similarly  a 
write  request  takes  a^  and  c^  segment  time  units  to  complete  its 
address  and  memory  cycles,  respectively.  It  is  assumed  in  the  simu- 
lation model  that  the  proportion  of  read  requests  to  write  requests  Is 

2:1.  Hence  the  effective  module  characteristics  are  (a  , c ),  where 

e e 

a - 2/3  a + 1/3  a and  c - 2/3  c + 1/3  c . For  the  case  studies, 

6 r W C r W 

It  Is  further  assumed  that  the  read  address  cycle  is  equal  to  the  write 

address  cycle.  Hence  a “a  “ a . 

e r w 

The  processor  system  consists  of  p parallel  pipelined  processors, 
where  each  processor  is  divided  into  three  units  namely,  the  preprocessor 
or  address  generation  unit  (AGU) , the  process  state  unit  (PSU)  and  the 
processor  or  execution  unit  (EXU).  Each  AGU  consists  of  one  segment 
and  hence  takes  one  segment  time  unit  (STU)  to  complete  Its  processing 
step. 

The  PSU  consists  of  c^  segments  in  series  and  acts  like  a shift 
register,  where  c^  is  the  read  memory  cycle.  Each  segment  contains  the 
process  state  vector  of  a distinct  process.  The  process  state  vector 
may  contain  such  information  as  the  request  status,  function,  address, 
and  priority.  Since  each  process  state  vector  of  a process  traverses 
all  c^  segments,  the  PSU  takes  c^  STUs  to  complete  its  processing  step. 

The  EXU  consists  of  two  segments  in  series  and  takes  two  STUs  to 
complete  its  execution  step.  It  should  be  pointed  out  that  the  total 


119  - 


number  of  segments  in  the  EXU  and  AGU  can  be  increased  with  little  or 
no  difference  In  the  simulation  results.  Therefore,  the  instruction 
cycle,  s ■ c^  + 3,  is  fixed  for  a given  pair  of  module  characteristics. 

If  the  first  segment  contains  the  1th  process  state  vector,  then  the 
jth  segment  contains  the  process  state  vector  corresponding  to  the 
(i  + j - 1)  mod  (c^  + 3)  process.  The  contents  of  the  shift  register 
are  shifted  and  updated  once  per  STU. 

The  AGU  generates  requests,  their  addresses  and  request  functions, 
namely,  read  or  write.  The  following  assumptions  are  made  regarding 
request  arrivals  to  the  memory  system. 

(1)  The  arrival  time  distribution  Is  constant  with  p requests 
issued  at  the  beginning  of  every  STU. 

(2)  Each  new  request  chooses  Its  address  from  a uniform  dis- 
tribution from  0 to  N-1 . 

Once  a request  is  issued.  It  stays  In  the  processor  memory  system 
until  serviced  to  completion.  At  the  completion  of  service,  a request 
is  terminated  and  ceases  to  exist  In  the  system. 

Read  requests,  which  are  typically  instruction  or  operand  fetch 
commands,  require  postprocessing,  whereas  write  requests,  which  are 
typically  store  commands,  do  not  require  postprocessing.  Hence  after 
an  accepted  read  request  fetches  the  instruction  or  operand  from  the 
memory,  it  Is  operated  on  by  the  segments  of  the  EXU  for  two  STUs.  The 
completion  of  the  execution  In  the  EXU  terminates  the  servicing  of  the 
accepted  request. 

On  the  other  hand,  an  accepted  write  request  takes  one  write  memory 
cycle,  c STUs,  to  complete  Its  service  In  the  memory.  Usually,  c > c . 

W W r 


- 120 


In  order  to  simplify  the  sequencing  problem  of  the  simulation  model, 
it  was  assumed  that  c.  * c_  + 2.  Thus  simultaneous  read  and  write 
requests  which  are  accepted  by  the  memory  system  will  eventually  term- 
inate their  services  concurrently.  That  is,  while  the  accepted  read 
request  Is  about  to  begin  Its  execution  in  the  first  segment  of  the  EXU, 
the  write  request,  which  was  accepted  simultaneously  with  the  read  re- 
quest, will  have  exactly  two  more  STUs  to  complete  its  memory  cycle. 
Moreover,  this  sequencing  is  consistent  with  a fixed  instruction  cycle 
assumption  for  read  and  write  instruction  cycles.  Furthermore,  it  also 
provides  for  synchronous  cycling  of  the  pipelined  processor. 

It  should  be  noted  that  the  EXU  of  each  processor  retains  the  pro- 
cess states  of  processes  currently  being  executed.  Furthermore,  It 
typically  serves  two  additional  functions,  namely,  as  an  Instruction 
decoder  and  as  an  Instruction  execution  unit.  Hence  read  requests  which 
are  instruction  fetches,  may  use  the  EXU  to  decode  the  instruction. 
Similarly,  a read  request  which  Is  an  operand  fetch  may  use  the  EXU  to 
execute  the  instruction. 

The  p simultaneous  requests  with  their  addresses  are  processed  in 
the  accept/reject  logic  to  determine  which  of  them  are  acceptable  re- 
quests. That  is,  the  accept/reject  logic  determines  which  of  the  p re- 
quests reference  idle  modules  on  idle  lines.  These  acceptable  requests 
are  then  subjected  to  MALC  tests  In  the  priority  network  to  resolve 
multiple  access  line  collisions  which  are  due  to  one  or  more  acceptable 
requests  referencing  the  same  line.  The  output  of  this  test  is  the  set 
of  accepted  requests  which  reference  distinct  Idle  lines  and  idle  mod- 
ules. 


121 


Since  we  are  Interested  mainly  in  the  overall  performance  of  the 
processor  memory  system  and  not  in  the  relative  performance  of  one  pro- 
cessor over  another,  the  service  discipline  becomes  irrelevant.  How- 
ever, for  experimental  purposes,  a simple  priority  scheme  Is  devised 
for  the  service  discipline.  This  scheme  assigns  priorities  to  different 
processors  such  that  processor  i has  priority  over  processor  J,  for 
i < j,  in  accessing  a line  whenever  a multiple  access  line  collision 
occurs  between  requests  from  processors  i and  j.  This  priority  scheme 
does  not  affect  the  total  system  throughput.  However,  other  priority 
schemes  may  be  desirable  in  practice.  The  overall  performance  results 
are  still  aopli cable. 

Subsequently,  the  accepted  requests  are  dispatched,  tiirough  a 
p X il  crossbar  switch,  to  their  respective  addressed  modules.  It  is 
assumed  that  it  takes  zero  STU  to  perform  the  above  preacceptance  rou- 
tines. One  or  two  STl/s  could  have  been  assigned  to  processing  these 
routines.  The  effect  would  be  to  increase  the  instruction  cycle,  s. 

This  increase  may  affect  the  transient  probability  of  acceptance  of  a 
request  in  the  NRP  system  because  a rejected  request  is  resubmitted  s 
STUs  later.  However,  it  would  have  little  or  no  effect  on  the  steady 
state  probability  of  acceptance  of  a request. 

The  performance  of  a parallel-pipelined  processor  of  order  (s,  p) 
having  access  to  an  (A,  m)  memory  configuration  with  effective  module 
character istics  (a^,  c^),  is  based  on  the  following  parameters: 

(1)  The  probability  of  acceptance,  (a,  c,  p) 

(2)  The  expected  average  wait  time  of  a request  from  arrival 
to  acceptance. 


122 


The  above  performance  evaluators  are  applicable  to  both  NRP  and  BRP 
systems.  In  addition,  for  the  buffered  system,  we  also  consider 

(3)  The  expected  average  queue  length. 

The  probability  cf  acceptance  is  the  probability  that  a typical  request 
which  arrives  at  the  memory  will  be  accepted.  The  throughput  of  the 
processor  memory  system  is  proportional  to  the  probability  of  accept- 
ance, when  the  number  of  processes  which  request  memory  within  one 
effective  memory  cycle  is  held  fixed  and  the  memory  cycle  is  fixed  too. 

Since  fixed  priorities  are  assigned  to  processors,  a processor 
with  a higher  priority  will  exhibit  lower  expected  wait  time  and  smaller 
expected  queue  length.  In  order  to  take  into  account  the  effect  of 
the  priority  scheme  employed,  the  expected  averages  of  the  wait  tirvr 
and  queue  length  are  computed.  The  expected  average  wait  tine  gives 
an  indication  of  the  turn  around  time  of  a request.  A small  expected 
wait  time  is  of  necessity  in  many  real  time  environments. 

The  expected  average  queue  length  is  a function  of  the  proportion 
of  read  to  write  requests  and  the  priority  between  them.  Although  the 
proportion  of  read  to  write  requests  is  fixed  in  this  model,  the  maxi- 
mum buffer  size  for  each  processor  is  assumed  to  be  infinite.  Further 
discussion  on  the  queue  length  is  presented  in  section  k.h. 

For  each  simulation  run  of  the  model,  the  probability  of  acceptance 
and  the  expected  wait  time  are  reported  for  each  memory  configuration. 
Notice  that  if  the  total  number  of  memory  modules,  N,  Is  2^,  there  are 
n + 1 distinct  memory  configurations.  For  a given  N,  all  the  possible 
memory  configurations  were  simulated.  Two  different  sizes  of  memory 
systems,  namely,  N ■ 61*  and  N ■ 10214,  were  simulated  for  four  processor 


i'4 


1 


- 123  - 


systems  with  p = 1,  2,  1*,  and  8,  and  five  sets  of  moduie  characteristics 
namely,  (a^,  c^)/(a^,  c^)  = (I,  k)/(],  6),  (l,2)/(l,  4),  (2,  2)/(2,  k) , 
(2,  4)/(2,  6)  and  (A.  k)/{k,  6). 

In  each  case,  the  simulation  was  performed  for  a fixed  number  of 
instruction  cycles  because  of  simulation  costs.  However,  the  simulation 
was  exercised  for  a higher  number  of  instruction  cycles  in  selected  cases 
and  it  was  found  that  no  significant  changes  of  (a,  c,  p)  occurred 
after  the  fixed  number  of  instruction  cycles.  Hence  it  was  assumed  that 
the  results  obtained  for  (a,  c,  p)  represent  a steady  state  value 
for  most  practical  purposes. 

^1.2  Nonbuffered  Request  Processor  System 

In  the  nonbuffered  request  processor  system,  when  a request  is 
rejected,  its  process  is  blocked  for  c^  + 3 segment  time  units.  The 
rejected  request  is  resubmitted  one  instruction  cycle  later  with  the 
same  address.  This  procedure  of  resubmission  of  a rejected  request  is 
repeated  until  the  request  is  accepted  whereupon  its  process  is  un- 
blocked. However,  when  the  process  is  unblocked,  it  cannot  issue  a 
new  request  until  the  instruction  cycie  of  the  accepted  request  is 
completed. 

Figure  k.2.]  shows  the  model  of  the  nonbuffered  request  processor 
system  with  p •=  1.  When  a request  is  accepted  or  rejected.  Its  state 
is  stored  in  the  first  segment  of  the  process  state  unit  which  is 
shifted  to  the  right  every  STU.  In  addition  to  the  address  of  the  re- 
quest, its  process  number  and  function,  the  state  of  a request  includes 
a status  flag  which  indicates  whether  the  request  was  accepted  or 


124  - 


Figure  4.2.1  Nonbuff ered  request  processor  system  for  p » 1 


125  - 


rejected.  For  a read  request,  the  request  state  is  transmitted  to  the 
EXU  segments  after  traversing  the  PSD  segments  for  STDs.  If  the 
status  flag  of  the  request  indicates  a rejection,  the  instruction  is 
not  processed,  otherwise  it  is  processed.  The  state  of  the  write  re- 
quest also  traverses  the  PSU  segments  for  c^  STUs.  However,  since  a 
write  request  requires  c^  + 2 STUs  for  its  servicing  in  the  memory  mod- 
ule, the  process  state  of  the  write  request  is  introduced  into  the  EXU 
for  2 STUs  but  the  write  request  requires  no  processing  there. 

In  effect,  each  AGU  and  EXU  contains  registers  for  storing  the 
states  of  active  processes  associated  with  a processor.  The  PSU  is 
associated  in  time  with  the  memory  system  unit  (MSU). 

This  is  not  the  only  method  of  handling  rejected  requests  between 
resubmissions.  Instead  of  the  recycling  technique  discussed  above,  a 
buffer  could  be  used  to  store  a rejected  request  for  + 3 STUs  and 
retry  the  request  at  the  beginning  of  its  processes  next  instruction 
cycle.  However,  the  control  problems  associated  with  this  intermediate 
buffering  technique  are  rather  more  involved  than  the  recycling 
techn i que. 

it.  3 Buffered  Request  Processor  System 

In  the  buffered  request  processor  system,  there  is  one  buffer  for 
each  of  the  p processors,  each  of  which  corresponds  to  s processes. 

All  the  requests  Issued  by  the  AGU  of  each  processor  are  queued  up  in 
that  processor's  buffer.  Since  there  are  s processes  active  in  each 
processor,  the  queue  for  a processor  will  contain  all  the  requests  from 


126  - 


PSU 


jojuunn 


Figure  A. 3.1  Buffered  request  processor  system  for  p = I 


127  - 


the  s processes  assigned  to  that  processor.  Each  request  issued  by 
a processor  joins  the  end  of  the  queue  for  that  processor,  hence  they 
are  buffered  in  the  order  of  their  arrival.  Figure  A. 3.1  illustrates 
the  BRP  system  for  p » 1. 

Each  segment  time  unit,  the  buffered  requests  in  each  queue  are 
simultaneously  scanned  in  the  order  of  their  arrival.  In  each  buffer, 
the  first  acceptable  request,  that  is,  the  first  request  whose  address 
corresponds  to  an  idle  line  and  module  is  selected  for  service.  This 
queueing  discipline  produces  at  most  p acceptable  requests  which  then 
have  to  undergo  the  priority  or  multiple  access  line  collision  test 
to  determine  which  of  the  acceptable  requests  to  accept  into  the 
memory . 

A process  with  a buffered  read  request  is  blocked  until  the  re- 
quest is  selected,  whereupon  it  is  immediately  unblocked.  However,  as 
pointed  out  in  the  NRP  system,  the  process  which  has  just  been  unblocked 
cannot  issue  a new  memory  request  until  the  service  of  the  previous  re- 
quest has  been  terminated.  A process  with  a buffered  read  request  is 
blocked  because  subsequent  instructions  or  STUs  of  that  process  may  de- 
pend on  the  data  read  by  the  buffered  request.  Therefore  there  is  a 
dependency  and  further  execution  of  that  process  cannot  proceed  until 
the  buffered  read  request  has  been  serviced. 

Unlike  the  NRP  system,  a buffered  write  request  does  not  cause  its 
associated  process  to  be  blocked.  A write  request  does  not  require 
postprocessing.  Furthermore,  the  only  dependency  which  may  arise  from 
a write  request  is  that  an  instruction  from  the  same  process  (assuming  in- 
dependent processes)  which  follows  the  write  request  may  cause  a reference 
to  the  same  memory  location  as  the  write  request. 


128  - 


Recall  again  that  exactly  one  request  is  issued  by  each  processor 
every  segment  time  unit.  However,  at  most  one  request  per  processor 
is  selected  for  service  every  STU.  Hence  every  STU , at  most  one  ser- 
viced request  is  directed  to  its  associated  processor.  Thus  no  col- 
lisions occur  in  the  processor. 

Since  the  requests  in  a buffer  are  scanned  in  the  order  of  their 
arrival,  a request  may  be  selected  for  service  before  its  predecessor, 
provided  the  request  does  not  reference  the  same  memory  location  and 
the  preceding  request  was  rejected.  Hence,  within  a process,  a request 
may  be  serviced  before  its  predecessor  if  the  predecessor  is  a write 
request.  Notice  that  sitice  a write  request  does  not  cause  its  associated 
process  to  be  blocked,  it  permits  a new  request  arrival  from  that  pro- 
cess. Therefore,  execution  precedence  within  a process  may  not  be  pre- 
served. However,  it  can  be  shown  that  although  execution  precedence 
within  a process  may  not  be  preserved,  the  computational  orecedence 
maintained.  Furthermore,  the  computational  determinacy  of  a process  is  in 
question  only  when  execution  precedence  is  not  maintained  within  the 
process.  However,  execution  precedence  is  required  between  two  requests 
in  a buffer  only  if  the  requests  refer  to  the  same  memory  location  and 
hence  the  same  line  and  module.  In  such  a case,  the  two  requests  must 
be  serviced  in  the  order  of  their  arrival. 

k.k  Discussion 

For  the  nonbuffered  case,  the  probability  of  acceptance,  (a,  c, 
p)  obtained  from  the  simulation  model  is  within  about  6^  of  the  corresponding 


129  - 


analytic  results,  hence,  the  graphical  plots  appear  identical.  Table 
4.4.1  shows  the  analytic  and  experimental  results  in  juxtaposition  for 
(a,  c)  “ (2,  4 2/3)  and  N = 64.  Similarly,  table  4.4.2  shows  the  ana- 
lytic and  experimental  results  for  (a,  c)  = (2,  4 2/3)  and  N » 1024. 

From  the  tables,  it  can  be  seen  that  the  analytic  and  experimental  re- 
sults are  similar,  especially  for  large  values  of  N. 

In  general,  the  experimental  results  are  less  than  their  corres- 
ponding analytic  results  unless  the  difference  is  less  than  the  error 
in  our  evaluation.  In  addition,  as  p increases,  the  difference  between 
the  experimental  and  analytic  results  becomes  more  apparent.  For  memory 
configurations  (A,  m)  such  that  1 ^ A < ap,  the  maximum  service  rate  of 
requests.  A/a,  is  less  than  the  request  arrival  rate,  p,  per  STU.  Hence 
there  is  excessive  rejection  of  requests.  In  the  experimental  model  for 
the  NRP  system,  the  rejected  requests  are  resubmitted  with  the  same 
module  address  one  instruction  cycle  later.  Hence  the  address  distri- 
bution is  not  uniform  and  there  is  a tendency  to  reference  lines  and 
modules  that  cause  the  rejections  more  frequently  without  success. 
Therefore,  the  probability  of  rejection  is  higher  for  the  experimental 
model . 

It  was  pointed  out  earlier  that  the  buffered  case  has  not  been 
modeled  analytically  because  of  the  Inherent  complexity,  hence,  we  do 
not  have  analytic  results  for  the  BRP  system.  However,  the  results  of 
the  experimental  model  for  the  BRP  system  will  be  tested  by  a comparison 
with  the  nonbuffered  case.  This  will  be  discussed  in  more  detail  in  the 


next  chapter. 


130  - 


TABLE 


(a,  c,  p)  for  (a,  c)  = (2,  2 2/3)  and  N = 6A 


— 

Memory 

Conf i guration 
(£,  m) 

% <=• 

c,  p) , p * 2 

(a,  c 

. p) , P = 

Experimental 

Exper i menta 1 

0.2482 

0.2501 

0.1241 

0.1250 

0.4244 

0.4058 

0.2393 

0.2204 

(4.  16) 

0.6007 

0.5745 

0.3990 

0.3764 

(8,  8) 

0.7474 

0.7438 

0.5712 

0.5434 

(16,  4) 

0.8487 

0.8431 

0.7190 

0.6996 

(32,  2) 

0.9097 

0.9044 

0.8231 

0.8077 

(64,  1) 

0.9434 

0.9316 

0.8866 

0.8774 

TABLE  k.k.2 


(a,  c,  p)  for  (a,  c)  = (2,  4 2/3)  and  N * 1024 


Memory 

Configuration 
(?-,  m) 

(a,  c 

p) , p = 4 

Experimental 

(1,  1024) 

0.2501 

■0 

0.1250  1 

(2,  512) 

0.4280 

0.4028 

HBB 

0.2228 

(4,  256) 

0.6072 

0.5931 

mm 

0.3667 

(8,  128) 

0.7568 

0.7421 

0.5506 

(16,  64) 

0.8604 

0.8578 

mBm 

0.7315 

(32,  32) 

0.9230 

0.9233 

0.8451 

0.8414 

(64,  16) 

0.9576 

0.9551 

0.9120 

0.9149  1 

(128,  8) 

0.9759 

0.9731 

0.9493 

0.9510 

(256,  4) 

0.9853 

0,9853 

0.9691 

0.9694 

(512,  2) 

0.9900 

0.9902 

0.9793 

0.9784  1 

(1024,  1) 

0.9924 

0.9935 

0.9845 

0.9847  1 

131 


Since  there  is  excessive  rejection  of  requests  for  memory  configura- 
tions (2.,  m)  for  1 ^ < ap,  some  rejected  requests  may  wait  in  the  sys- 

tem unserviced  indefinitely.  Furthermore,  e(W)  and  e((i)  may  not  reach 
steady  state  values  in  the  simulation  time  period  when  £ is  near  ap. 

e(W)  and  e(Q)  are  evaluated  as  follows.  Let  e j (W)  be  the  expected 
wait  time  of  requests  from  processor  j.  Given  that  t.j  “ time  request 
(i,  j)  is  accepted  - time  request  (i,  j)  is  issued, 

R. 

1 J 

e,  (W)  - •=-  Z t, . , 


where  (i,  j)  is  the  ith  request  from  the  jth  processor  and  Rj  is  the 
total  number  of  requests  issued  by  processor  j.  Hence  the  expected 
average  wait  time  is 


e(W) 


I P 

^ Z e,(W), 
P j-i  J 


where  p is  the  number  of  processors.  For  some  requests,  t.j  cannot  be 
evaluated  since  the  corresponding  requests  are  still  in  the  system 
unaccepted  by  the  end  of  the  simulation. 

In  order  to  evaluate  e(Q),  let  qj  (t)  represent  the  current  queue 
length  for  processor  j.  The  expected  queue  length  for  processor  j is 

1 T 

e.  (Q)  - Y"  2 df  (t)  , 

J t-0 

where  T is  the  total  simulation  time.  Then  the  expected  average  queue 
length  is 

1 P 

e(Q)  - 1 Z e,(Q)  . 

P j-1  J 


- 132  - 


In  actual  computation,  each  time  an  entry  is  inserted  in  or  removed  from 
the  queue,  the  product  of  the  old  length  of  the  queue  and  the  interval 
of  time  over  which  this  length  existed  is  accumulated.  If  il  < ap. 


the  queue  length  will  tend  to  grow  to  infinity  with  time.  Hence,  it  is 


impossible  to  obtain  a steady  state  evaluation  of  e(Q) 
simulation  time  period  when  I < ap. 


in  the  finite 


I 


I 


I 


- 133  - 


5.  ANALYSIS  OF  RESULTS 


5.1  Introduction 

In  this  chapter,  the  effects  of  varying  the  i,  N,  a,  c,  p and  x 
parameters  will  be  discussed  with  a view  to  understanding  the  design 
options  available. 

The  primary  performance  indicator  is  the  probability  of  acceptance, 
P^(a,  c,  p) . P)  ^ given  memory  configuration  represents 

the  system  throughput  as  a fraction  of  the  maximum  possible.  However, 
in  order  to  compare  and  contrast  systems  with  different  module  charac- 
teristics, (a,  c)»  and  processor  order,  p,  the  bandwidth  is  used  as  a 
performance  Indicator. 

Definition  5.1 •!  The  bandwidth  of  an  (£,  m)  memory  configuration 
accessed  by  a paral 1 el -p ipel ined  processor  of  order  (s,  p)  Is  the  ex- 
pected number  of  accepted  memory  requests  in  one  memory  cycle  and  is 
given  by  B(a,  c,  p)  - pcP^(a,  c,  p) , where  (a,  c)  are  the  module 
characteristics. 

Hence  B(a,  c,  p)  represents  the  system  throughput  per  memory  cycle. 
Notice  that  the  bandwidth,  B(a,  c,  p)  Is  the  effective  parallelism  in 
the  memory. 

When  the  read  and  write  module  characteristics  differ,  the  effec- 
tive module  characteristics,  namely,  (a^,  c^) , will  be  used  in  place 
of  (a,  c) . To  Illustrate  the  effect  of  the  segment  time  unit,  x,  on 


- - 


performance,  it  is  heipfui  to  express  the  bandwidth  as  the  expected  num- 
ber of  accepted  memory  requests  per  second.  In  this  case  B(a,  c,  p,  t)  * 
c,  p)  requests/second. 

A secondary  performance  indicator  is  the  expected  average  wait  time, 
e(W),  which  is  the  average  time  a request  waits  before  it  is  serviced, 
assuming  that  ali  requests  which  arrive  s imui taneously  have  the  same 
chance  of  being  accepted.  e(W)  may  be  used  in  conjunction  with  c*  p) 

or  B(a,  c,  p)  in  the  nonbuffered  and  buffered  request  processor  systems 
to  evaluate  the  effectiveness  of  various  parameters.  The  expected  aver- 
age queue  length,  e (Q) , is  the  average  length  of  each  queue,  assuming 
that  all  p processors  have  identical  priority,  e (0.)  is  also  used  in 
evaluating  the  effectiveness  of  the  buffered  case. 

The  effects  of  parameter  variation  on  performance  are  discussed 
primarily  for  the  nonbuffered  case.  These  effects  can  be  easily  extend- 
ed to  the  buffered  case.  Since  there  is  no  exact  solution  for  in  the 
general  case,  the  lower  bound  is  used  occasionally  to  illustrate  the 
effects  of  some  of  the  parameters  on  the  probability  of  acceptance, 

P^(a,  c,  p). 

It  is  impractical  to  show  the  combined  effects  of  all  the  parameters 
pictorially  on  a two-dimensional  graph.  A simplification,  which  is 
adopted  here,  studies  the  effect  of  each  variable  on  performance, 

1 ndependent ly . 

5.2  Effect  of  Number  of  Modules  (N)  on  Performance 

The  total  number  of  memory  modules,  N,  can  be  interpreted  in  two 
ways.  For  a given  memory  size  of  M words,  various  memory  configurations 


I 


- 135  - 


are  obtainable.  By  varying  the  size  of  the  memory  module,  z,  the 
total  number  of  memory  modules,  N,  can  be  varied.  Alternatively,  the 
total  number  of  memory  modules,  N,  represents  the  total  size  of  the 
memory  if  the  size  of  the  memory  module,  z,  is  fixed. 

In  the  discussion  that  follows,  M is  considered  fixed  and  hence 
is  irrelevant  to  the  discussion  that  follows.  The  first  interpretation 
of N is  of  primary  interest.  The  determination  of  the  absolute  memory 
size,  M,  which  usually  involves  page  faulting  and  memory  hierarchy 
considerations,  is  outside  the  scope  of  this  research. 

It  is  intuitively  obvious  that  for  a given  number  of  lines,  %, 
and  processor  configuration,  (s,  p) , an  increase  in  N increases  P^. 
However,  the  relative  increase  cannot  be  obtained  intuitively.  In  some 
situations,  doubling  N may  not  significantly  Increase  P^. 

Figures  5.2.1,  5-2.2,  and  5-2.3,  which  were  obtained  from  the  re 
suits  of  theorems  3-5-2  and  3-5-3.  illustrate  the  effect  of  N(_>  2)  on 
P^  for  2 ■ 4,  16  and  64  respectively.  In  general,  an  increase  in  N 
increases  the  performance  for  given  2,  a,  c and  p. 

As  N approaches  infinity,  the  lower  bound  on  P^  of  theorem  3-6.2 
approaches  the  upper  bound  on  P^  of  theorem  3-6-3  as  shown  below.  Recall 

1 D 2 

that  1 - P|  “ [1  - (l“j)^]  and  the  lower  bound  is  given  by 


I 

I 


Probability  Of  Acceptance, (a,c,p) 


136  - 


F^-929* 


FJgure  5.2.1  Effect  of  N on  for  £,  - 4 


I 


i 

- 137  - 


I Figure  5-2.2  Effect  of  N on  for  £ - 16 

I 

I 

I 


Probability  Of  Acceptance , F^(a  ,c,p) 


138  - 


Figure  5.2.3  Effect  of  N on  for  £ - 64 


- 139  - 


But  the  upper  bound  of  theorem  3.6.3  says  P^(a»  c, 
Thus, 


p)  < 


1 + (1-P,) 


p(a-l ) ‘ 
Z 


lim  P.(a,  c,  p)  * (1  - P,)/[l  + (1  - P.)  ^ ] . 

N-*-oo  ” ' IX, 

From  the  bounds  expressions  above,  it  is  seen  that  for  very  large  N, 
the  memory  cycle,  c,  does  not  have  a significant  effect  on  P^.  Note  that 
c is  usually  a small  number  in  most  systems,  generally  less  than  10. 

Hence,  for  large  N,  is  limited  by  Z and  p.  The  graphs  also  show  that 
the  address  cycle,  a,  highly  affects  P^  for  large  N.  As  an  illustration 
consider  figure  5.2.1,  which  is  for  Z * k.  Suppose  that  p = 1*  and  P^  is 
required  to  be  0.4.  Using  (a,  c)  * {2,  4),  N is  required  to  be  at  least 
256,  whereas,  if  (a,  c)  * (1,  4),  N is  required  to  be  only  16. 

For  large  Z and  small  p,  and  for  small  Z and  large  p,  there  is 
progressively  less  payoff  (increase  in  P^)  to  increasing  N.  However, 
for  small  Z and  small  p,  and  large  Jl  and  large  p,  there  is  some  signifi- 
cant payoff  to  increasing  N.  Figures  5.2.1  and  5.2.3  illustrate  these 
phenomena.  It  is  obvious  from  a glance  at  the  figures  that  the  choice 
of  N is  critical  in  a certain  range  of  memory  configurations  and  processor 
systems.  However,  it  is  clear  that  the  choice  of  N cannot  be  made  in- 
dependently from  other  parameters.  This  discussion  will  be  exemplified 
by  some  case  studies  in  section  5.8. 


5.3  Effect  of  the  Number  of  Lines  on  Performance 

It  is  obvious  from  the  collision  considerations  that  an  increase 
In  Z reduces  the  line  collisions  and  multiple  access  line  collisions  and 
hence  increases  the  performance.  Figures  5.3.1  and  5.3.2,  which  were 


P^(a,c,p) 


obtained  from  the  results  of  theorems  3.5*2  and  3.5. 3i  illustrate  the 
effect  of  the  nurrher  of  lines  on  for  M = ik  and  f!  = 102^*  respectively. 

For  I « p,  the  lines  are  saturated.  Hence  in  this  recicn  there 
is  very  little  payoff  in  performance  to  doubling  1.  It  is  obvious  that 
configurations  with  2.  < p result  in  poor  and  hence  undesirable  perform- 
ance because  of  the  excessive  line  saturation  in  this  region.  Further- 
more, the  line  saturation  worsens  as  a increases. 

There  is  a point  of  inflection  at  I = p.  For  I in  the  neighborhood 
of  p and  ap,  there  is  a significant  increase  in  fcr  an  increase  in  1. 
Hence  this  is  the  region  in  which  the  partial  derivative  of  P^  with 
respect  to  i is  greatest.  Beyond  2 = ap,  P^  is  close  to  1.  Hence,  in- 
crease in  £ increases  P^  which  asymptotically  approaches  the  upper  bound 
of  theorem  3.6*3.  Notice  that  in  this  case  the  asymptote  is 
(1  - P,)/[l  + lLl_Pll_£i5zll]  , where  1 - P,  = [1  - (1  - 

5*^  Effect  of  H.odule  Cha rac ter  i s t i cs  cn  Perforr;ance 

It  is  obvious  that  an  increase  in  the  address  cycle,  a,  increases  the 
line  collisions,  v/hereas  an  increase  in  the  r.emory  cycle,  c,  increases  the 
rodule  collisions.  Hence  as  a or  c increases,  the  probability  of  accept- 
ance decreases.  The  effects  are,  however,  minimal  when  £ and  N,  respec- 
tively, are  sufficiently  large. 

Recall  that  a reduction  in  £ increases  the  number  of  line  collisicns 
and  multiple  access  line  collisions.  Hence  as  a increases  for  small  £, 

P^  is  sharply  reduced.  As  £ increases,  the  line  collisions  and  multiple 
access  line  collisions  are  reduced  and  P^  is  less  sensitive  to  an  Increase 


- Ii*3  - 


in  a.  This  effect  is  illustrated  in  figure  where  is  plotted 

against  (a,  h)  for  a = I,  2,  3 and  A for  various  treirory  configurations. 
Figures  5.A.1  and  5. A. 3 were  obtained  from  results  of  theorems  3.5.2 
and  3.5.3. 

The  effect  of  a can  also  be  explained  somewhat  analytically  by 
using  the  lower  bound  formula  for  P^,  which  is 

1 + (1  - P,) 

There  are  essentially  three  regions  of  Z in  which  the  effects  of  a 

are  highlighted.  For  small  Z such  that  £ < p and  £ « N,  the  lines  are 

I 

saturated  and  P,  A — . .’lence  P,  is  inversely  proportional  to  a.  For 
A = ap  A / f-  r 

£ nearly  H,  an  increase  in  a increases  the  ^ term  while  decreasing 

the  P term.  The  overall  effect  is  a slight  increase  in  the  denomi- 
nator  of  the  lower  bound  expression  with  increase  in  a.  Hence  P^  de- 
creases slightly  with  increase  in  a for  £ nearly  N.  When  £ = N,  theorem 
3.6.1  can  be  applied.  It  was  seen  then  that  for  £ = N,  P^  is  independent 
of  a.  Hence  P^  remains  constant  for  £ = N as  a increases. 

In  section  5-2,  it  was  shown  that  P^  is  insensitive  to  c for  large 
N (»pc).  This  insensitivity  is  illustrated  in  figure  5. A. 2,  which  was 
obtained  from  results  of  theorem  3-5.2,  for  N * 256  and  p = A.  It  should 
be  noted  that  c cannot  however  be  arbitrarily  large,  since  c < s,  which 
is  the  degree  of  pipelining. 

The  combined  effect  of  a simultaneous  increase  in  a and  c is  illus- 
trated in  figure  5-A.3  ^or  c/a  = 2.  It  is  also  seen  here  that  a is 


critical  in  determining  P^. 


- 


FP-  5264 


Figure  5.^.1  Effect  of  (a,  c)  on  for  c = 4,  p = 4,  and  N = 256 


Probability  Of  Acceptance,  (a,  c,  p) 


li*5  - 


FP- 5265 


Figure  5.4.2  Effect  of  (a,  c)  on  for  a » 1 , p * 4,  and  N * 256 


o 

-Q 

O 

CL 


p=4 
N=256 

t2  2;4  3^6  4^8  5,10 

Module  Characteristics  ,(a,c)  For  c/a=2 

rpt52€6 


Figure  5.^.3  Effect  of  (a,  c)  on  P.  for  — = 2,  p = A, 
and  N = 256  ^ ^ 


r 


It  appears  that  a reasonable  conclusion  is  that  in  general,  if  N 
is  high  enough,  variations  in  c have  little  effect  on  for  any  config- 
uration. If  i is  large  or  £ is  close  to  N,  variation  in  a has  little 

effect  on  P,. 

A 

5.5  Effect  of  Processor  Order  on  Performance 

The  choice  of  p is  very  critical  in  obtaining  a reasonable  perform- 
ance. Once  again,  employing  the  lower  bound  formula  for  P^,  the  sensi- 
tivity of  P^  to  p can  be  evaluated  for  some  class  of  p.  For  large  p 
such  that  p » £,  (1  - ^ 0.  Hence,  4 Hc-T5T  = large 

N.  In  this  region,  P^  is  small.  Furthermore,  increasing  p without  limit 

decreases  P^  asymptotically  to  zero.  This  effect  is  illustrated  in  figure 

5.5.1.  However,  for  small  p such  that  p « £,  (l  “ 4 ' " f"’  Hence, 

P,  A 7 — n 7 T"  In  this  region,  P,  is  close  to  1 . An  increase 

A ” , , p(a-l ) p(c-a)  ^ A 

' £ N 

in  p does  not  affect  P^  significantly  as  long  as  p « £ and  N _>  £.  For 
example,  in  figure  5.5.1 » if  (a,  c)  = (2,  k) , £ = 6^  and  p = 2,  an  in- 
crease in  p by  100%  (i.e.,  to  decreases  P^  by  less  than  6%.  However, 
for  smaller  £ or  larger  p,  P^  becomes  more  sensitive  as  is  evidenced  for 
£ “ 16  or  p = 16  to  32. 

The  total  number  of  memory  modules,  N,  influences  the  effect  of  p 
on  P^.  An  increase  in  N shifts  the  curves  up,  that  is,  toward  P^  = I 
and  flattens  them,  making  them  less  sensitive  to  p.  Similarly,  the 
sensitivity  of  P^  to  p is  accentuated  by  a decrease  in  N. 

However,  if  bandwidth  is  the  performance  criterion,  then  as  illus- 
trated in  figure  5.5.2,  an  increase  in  p Increases  the  bandwidth  generally. 


Probability  Of  Acceptance , P^(a,c,p) 


Bandwidth,B(i'equests  per  memory  cycle) 


149  - 


Number  Of  Processors,  p 


Figure  5.5-2  Effect  of  p on  B for  N = 256 


150  - 


The  leveling  off  of  the  bandv/IHth  curves  for  p > also  illustrates  the 
saturation  of  the  lines.  For  p « £,  is  close  to  1 and  an  increase 
in  p increases  the  bandv/idth  alrrost  linearly. 

In  suirrrary,  for  £ >>  p and  very  large  M,  is  close  to  1 and  is 
not  very  sensitive  to  small  variations  in  p.  Hence  the  bandvndth  In- 
creases almost  linearly  v/i  th  p.  As  p/J?.  increases  to  1,  P^  becomes  very 
sensitive  to  variations  ir  p and  there  is  a point  of  inflection  in  the 
bandwidth  curves  in  this  region.  Once  p > £,  P^  decays  asymptotically 
to  zero  as  p increases  further.  In  this  region  the  bandv.'idth  curves 
flatten . 


5.6  Effect  of  Processor  Speed  on  Performance 


The  segment  time  unit,  T,  was  not  really  considered  explicitly  in 
the  analytical  or  simulation  models.  However,  its  effect  can  be  explained 
in  tiie  following  ways  with  some  examples.  Recall  that  the  absolute  and 


relative  module  charac ter i st i cs  are  (a  , c^)  and  (a,  c)  respectively, 

o o 

where  a = 


and  c = I— ^ 


Suppose  the  absolute  module  characteristics  (a^,  c^)  are  given  as 
(200,  i»00)  in  nanoseconds.  If  T = 200,  100  and  50  nanoseconds,  then 
(a,  c)  = (1,  2),  (2,  A)  and  (h,  8)  respectively.  In  section  5.^,  the 
effects  of  (a,  c)  on  the  probability  of  acceptance,  P^,  were  discussed. 
Recall  that  for  large  M and  small  values  of  I,  a simultaneous  Increase 
in  a and  c decreases  P^  exponentially.  Thus  a decrease  ir.  t causes  a 
decrease  in  P^.  In  this  case  however,  P^  is  not  an  appropriate  perform- 
ance indicator  of  the  effect  of  t since  decreasing  T also  increases  the 


- 151 


rate  of  requests  to  the  menwry.  Instead,  the  absolute  bandwidth, 

B(a,  c,  p,  t),  is  adopted  as  a ireasure  of  performance.  B(a,  c,  p,  t)  = 

P^(a,  c,  p) . 

Three  examples  are  developed  below  to  assess  the  effects  of  de- 
creasing T.  These  correspond,  respectively,  to  a constant  request  rate 
assumption,  a higher  request  rate  assumption,  and  a uniformly  faster 
processor /memory  system  assumption. 

Suppose,  for  a first  example,  the  number  of  processes  v/hich  request 
memory  within  one  memory  cycle  is  fixed  at  1^  as  T varies.  Then  = pc. 
Therefore  doubling  the  speed  of  the  processor  (halving  x)  doubles  c and, 
in  effect,  halves  p in  order  to  keep  constant. 

Hence  B(a,  c,  p,  x)  can  be  rewritten  as  — P«(a,  c,  p)  ~ P) 

CT  M Q r\ 

O 

where  p is  proportional  to  X and  c is  inversely  proportional  to  x.  Figure 
5.6.1,  which  is  obtained  from  results  of  theorems  3.5-2  and  3.5-3»  illus- 
trates the  example  given  above  for  1^  = 8,  (a^,  c^)  = (200,  1»00)  ns,  N = 

256  and  p goes  from  I to  ^ as x goes  from  50  to  200.  Observe  that  an  in- 
crease in  the  speed  of  the  processor  (decrease  in  x),  reduces  the  band- 
width especially  for  configurations  with  small  i.  For  large  values  of  i, 
the  reduction  is  less  noticeable.  Recall  that  a reduction  in  p increases 
the  probability  of  acceptance.  The  reduction  in  the  bandwidth  is  due  to 
a combination  of  the  contrary  effects  of  reducing  p while  increasing 
(a,  c)  simultaneously  as  the  speed  is  increased  (x  is  decreased). 

The  above  example  illustrates  one  important  factor.  For  a constant 
1^,  a decrease  in  processor  speed  increases  p and  decreases  c simultane- 
ously. For  the  constant  1^  assumption,  doubling  x halves  s.  Hence  sp 
is  constant  as  x changes.  Under  the  constant  1^  restriction,  it  can  be 


p=r  1 ~ 2 

c=[__8 4 

(QQ,CQ)=(200,400)ns 

Io=8 

N=256 


1 L_ 

50  100  200 

Segment  Time, r (nsec) 


Figure  5-6. 1 


Effect  of  Processor  Speed  on  Bandv/idth  for 
a Constant  Request  Rate 

^ r D • 


I 

I 

I 


153 


deduced  from  figure  5.6.1  that  increasing  T increases  the  bandwidth. 

Hence  for  a constant  sp,  bandwidth  is  maximized  by  maximizing  p.  In 
the  above  exampie,  a change  in  T corresponds  to  physical  changes  in 
the  processor  design. 

Consider  a second  exampie  in  which  is  not  fixed.  Assume  that 
N,  p and  (a  , c ) are  fixed  but  T varies.  Mote  that  s will  still  vary 
inversely  with  T and  therefore  may  necessitate  major  changes  in  the 
processor  system.  Figure  5.6.2,  which  is  obtained  from  results  of 
theorems  3.5.2  and  3.5.3,  illustrates  the  higher  request  rate  example 
for  M = 256,  p = 4 and  (a^,  c^)  = (200,  400)  ns.  In  this  case,  an  in- 
crease in  the  processor  speed  increases  the  degree  of  pipelining,  s, 
and  the  rate  of  requesting  memory.  In  general,  an  increase  in  the  pro- 
cessor speed  increases  the  bandwidth.  For  large  i,  approaches  1 and 
the  bandwidth  is  inversely  proportional  to  T. 

In  a third  example,  it  is  assumed  that  1^,  N,  s,  p,  a and  c are 
fixed.  Decreasing  T then  simply  corresponds  to  faster  clocking  of  the 
processor.  Furthermore,  decreasing  T requires  a proportional  decrease 
in  (a^,  Cq)  so  that  (a,  c)  is  fixed.  For  an  example,  let  N = 256,  p = 4, 
and  (a,  c)  = (1,2).  P)  ^ constant  for  a given  memory  con- 

figuration as  processor  and  memory  speed  changes.  Figure  5.6.3,  which 
is  obtained  from  the  results  of  theorem  3.5.2,  shows  the  effect  of  the 
processor  and  memory  speed  on  the  bandwidth  for  various  memory  configura- 
tions. In  all  configurations,  an  increase  in  speed  (decrease  in  t) 
increases  the  bandwidth  proportionally.  Notice  that  the  bandwidth  relative 
to  the  memory  cycle  is  constant  for  a given  (z,  m)  as  the  speed 
changes.  in  fact,  doubling  the  speed  doubles  the  absolute  bandwidth. 


Figure  5.6.2  Effect  of  Processor  Speed  on  Bandwidth 
for  Varying  Request  Rate 


FP-5270 


Segment  Time,r(nsec) 


Figure  5-6 


• 3 Effect  of  Processor  and  MenKiry  Speed 
Bandwidth  Uniform  Menwry  System 


on 


156 


This  is  expected  since  doubling  the  processor  speed  necessitates  a new 
memory  nodule  whose  absolute  nodule  characteristics  are  half  the 
previous  ones.  In  this  example,  a change  in  the  processor  speed  will 
therefore  require  a change  of  the  old  nenory  system  to  a nev;  one  as 
explained  above. 

5. 7 Effect  of  Buffering  on  Performance 

For  any  configuration,  the  buffered  scheme  produces  as  good  or  better 
probability  of  acceptance,  than  the  nonbuffered  scheme.  All  illus- 
trations in  this  section  are  obtained  from  simulation  results.  Examples 
are  Illustrated  in  figures  5.7.1  and  5.7-2.  Recall  that  a processor's 
buffer  is  scanned  every  STD  for  an  acceptable  request.  This  process  is 
more  likely  to  produce  an  acceptable  request  than  the  recycling  technique 
for  the  NPR  system.  Hence  for  the  BPR  system,  a rejected  request  remains 
in  the  queue  which  is  rescanned  the  next  STU,  thus,  giving  the  rejected 
request  another  trial  if  none  of  the  preceding  requests  is  an  acceptable 
request . 

Figures  5-7.3  and  5-7.^  give  a very  good  indication  of  the  effect  of 

buffering  on  the  bandwidth  for  N = 61*  and  M = 1024  respectively,  for 

(a  , c ) = (2,  4.67).  Recall  that  (a  , c ) are  the  effective  miodule 
e e e e 

characteristics  which  take  into  account  the  read  and  write  module  charac- 
teristics and  request  distribution.  A point  of  inflection  occurs  at  = 
p.  For  i < p,  the  lines  are  saturated,  resulting  in  excessive  blocking 
of  processes.  Again,  a configuration  with  £ < p is  not  desirable. 

In  general,  the  bound  of  theorem  3. 6. A applies  to  the  buffered  case 
as  well  as  the  nonbuffered  case.  An  example  of  this  bound  is  shown  in 


Probability  of  Acceptance, 


157  - 


Number  of  Parallel  - Pipelined  Processors,  p 

F»  *S50 


gure  5.7.1 


Probability  of  Acceptance  Versus 
for  Some  Configurations,  where  N 
(2,  k.67) 


Number  of  Processors 

* 64  and  (a  , c ) * 
e’  e 


158  - 


FP--t551 


Figure  5.7.2  Probability  of  Acceptance  Versus  Nutrber  of  Processors 
for  Some  Configurations,  where  N = 102^  and  (a  , c ) = 
(2,  i*.67)  ® 


159  - 


Number  Of  Lines, ^ 

FP-4552 


Figure  5-7.3  Bandwidth  Versus  Memory  Configuration  for 
Various  Values  of  p (number  of  processors), 
where  N =•  6^4  and  (a^,  c^)  =»  (2,  ^.67) 


Figure  5.7.^  Bandwidth  Versus  Merrory  Configuration  for 
Various  Values  of  p (nunber  of  processors), 
where  N «•  102^4  and  (a^,  c^)  = (2,  ^4.67) 


161 


figure  5-7>5  for  N = 102l>.  As  can  be  seen  from  figures  5.7.3  and  5.7.^, 
buffering  tends  to  have  its  maximum  effect  near  Z = ap,  where  the  upper 
bound  on  the  bandwidth  from  theorem  3.6.1)  is  very  weak  for  the  nonbuffered 
case.  For  small  Z and  large  N,  the  bandwidth  of  the  nonbuffered  case  is 
close  to  the  bound,  implying  line  saturation.  In  this  case,  buffering 
has  little  effect.  For  large  Z and  N,  approximates  I in  the  nonbuffered 
case  as  shov;n  in  figure  5.7.2.  Again,  buffering  cannot  improve  sig- 
nificantly. However,  if  p is  large  so  that  cp  is  close  to  N,  buffering 
can  Improve  the  bandwidth  significantly  when  the  lines  are  not  saturated, 
as  shown  in  figure  5.7.3. 

All  the  parameters  discussed  in  earlier  sections  affect  the  perform- 
ance in  the  buffered  scheme  as  they  do  in  the  nonbuffered  scheme.  For 
example,  figures  5.7.6  and  5.7.7,  when  compared  with  5.7.3  and 
illustrate  the  effect  of  reducing  the  address  cycle,  a,  from  2 to  1 . It 
is  seen  that  this  change  increases  the  bandwidth  for  small  values  of  Z 
in  both  schemes.  However,  since  c remains  constant,  the  maximum  bandwidths 
are  identical  for  a = 1 and  a = 2 and  occurs  at  Z = N.  In  effect,  the 
bandwidth  curves  for  a = 1 level  off  much  earlier  than  their  counterparts 
for  a = 2.  The  overall  effect  of  reducing  a i s to  achieve  a higher  band- 
width for  configurations  with  small  Z. 

Buffering  can  thus  most  effectively  be  used  for  two  purposes.  One, 
to  increase  the  performance  of  a configuration  with  Z in  the  vicinity  of 
ap.  Two,  when  Z exceeds  ap,  buffering  may  be  used  to  maintain  bandwidth 
while  reducing  Z.  In  the  nonbuffered  case,  a rejected  request  cannot 
make  a retrial  in  less  than  an  Instruction  cycle  length,  whereas,  the 
interval  between  retrials  of  a rejected  request  in  the  buffered  case  is 


162 


1 


Figure  5.7-5 


Maximum  Bandwidth  Versus  Memory  Configuration  for 
Various  Values  of  p (number  of  processors) 


Expected  Value  Of  Bandwidth,  B 


163  - 


rP-52T| 


Bandwidth  Versus  Memory  Configuration  for  Various  Values 
of  p,  where  N » 64  and  (a^,  c^)  « (I,  4.67) 


lijure  5.7.6 


Figure  5-7*7  Bandwidth  Versus  Memory  Configuration  for  Various 

Values  of  p,  where  M = 10214  and  (a  , c ) = (1,  A. 67) 

e e 


- 165 


less  than  an  instruction  cycle.  Hence  buffering  has  a smaller  expected 
average  wait  time  of  a request,  e (W) , as  evidenced  in  the  examples  of 
figures  5-7.8  and  5.7-9.  These  curves  which  are  derived  from  simulations 
are  shown  for  the  more  stable  region,  namely  ^p,  for  which  the  simu- 
lation reaches  steady  state  easily.  As  N increases,  e(W)  decreases. 
Buffering  also  tends  to  have  its  maximum  effect  on  e(W)  for  £ in  the 
vicini ty  of  ap. 

Although  the  maximum  queue  length  allowable  for  each  processor  in 
the  simulation  model  is  infinite,  the  expected  average  queue  length, 
c(Q)»  is  quite  small  in  these  examples.  Figures  5.7.10  and  5.7.11  illus- 
trate the  expected  average  queue  lengths  for  various  memory  configura- 
tions and  processor  orders  with  (a  , c ) = (2,  and  N = 64  and 

e e 

N = 1024  respectively.  Increasing  N decreases  e(Q)  especially  for  large 
£.  The  read  and  write  request  distribution  also  affects  the  queue  length 
as  explained  in  section  4.3. 

5.8  Design  Tradeoffs 

In  this  section,  certain  concepts  for  designing  a para  1 1 e 1 -pi pe 1 i ned 
processor-memory  system  are  introduced  by  way  of  examples.  Various  op- 
tions open  to  the  designer  in  meeting  a required  design  objective  are 
investigated.  Since  the  number  of  parameters  is  large,  the  numJjer  of 
available  options  is  so  large  that  it  would  be  impractical  to  consider 
all  the  possible  options  and  derive  a rigorous  design  methodology  that 
can  be  applicable  to  the  general  case.  However,  a designer  can  make 
appropriate  decisions  at  various  stages  of  the  design  to  reduce  the  num- 
ber of  options  and  make  the  design  problem  less  difficult  to  handle. 


Number  Of  Lines,/ 


F i gure  5 . 7 


fP  W»4 


.8  Wait  Time  Versus  Memory  Configuration  for  Various  p 
where  N “ 6^  and  (a  , c ) = (2,  ^.67) 


Number  Of  Lines,/ 


jro  5- 


»»-52T« 


7.10  Queue  Length  Versus  Memory  Configuration  for  Various 
where  N = 64  and  (a  . c ) = (2.  4.67) 


Expectec 


L69 


170  - 


Often  the  cost-perfcrmance  a design  is  the  criterion  used  to 
evaluate  the  merit  of  the  design.  A design  ray  have  very  high  perform- 
ance, but  p-ay  not  be  econcni ca  1 1 y feasible.  Furthermore,  it  m^ay  not 
be  technologically  attainable.  For  example,  as  the  segment  tine  unit, 

T,  approaches  zero,  the  bandv/idth  approaches  infinity.  Eventually 
prohibitive  cost  yields  to  technological  impossibility  as  performance 
is  driven  higher.  Economic  factors  and  technological  constraints  rust 
be  cons i de red . 

An  increase  in  the  nur.ber  of  lines,  ?,  reauires  a more  expensive 
line  decoder  to  decode  the  line  addresses.  Recall  that  the  number  of 
gates  in  a p by  £ crossbar  sv;itch  system  is  proportional  to  p£.  Hence 
an  increase  in  £ or  p increases  the  cost  of  the  crossbar  switch  m^atrix. 
In  addition,  an  increase  in  p increases  the  cost  of  the  processor  sys- 
tem, although  it  also  increases  the  bandv;idth  of  the  system.  From  pre- 
vious discussion,  it  was  seen  that  an  increase  in  £ Increases  the  prob- 
ability of  acceptance  of  a request.  An  increase  in  £ increases  the 
nur.ber  of  line  drivers  that  are  required  to  drive  the  module  addresses 
down  the  line.  Line  drivers  are  also  reculred  to  drive  the  data  on  the 
data  buses.  The  line  driver  capahil'ty  depends  on  the  nur.ber  of  mcrcry 
modules,  m,  on  each  line.  As  m increases,  the  required  line  driver  cap- 
ability must  increase  and  may  result  in  a higher  cost  for  each  line 
driver. 

An  increase  in  the  total  nur.ber  of  memory  modules,  M,  increases  the 
performance,  but  may  not  significantly  increase  the  total  cost  of  the 
memory  of  size  M (=  N'z) , because,  for  semiconductor  memory,  the  cost  per 
bit  is  virtually  constant  for  a wide  range  of  module  sizes,  z.  However, 


171 


the  increase  in  N requires  a larqer  decoder  which  is  used  to  decode  the 
nenory  address  to  select  the  addressed  menory  nodule.  Hence  ^n  increase 
in  N may  increase  the  cost  of  the  memory  address  decoder.  The  choice 
of  a,  c,  £,  N and  p also  affect  the  size  and  hence  the  cost  of  the  function 
unit  that  is  required  to  accept  or  reject  an  incoming  request  which  may 
encounter  a line  or  module  collision. 

Such  contrasting  requirements  tend  to  pose  a limiting  constraint 
on  the  design  parameters. 

In  some  semiconductor  memory  modules,  the  absolute  module  chara- 
cteristics (a  , c ) are  such  that  a = c . However,  with  present  trends 
o’  o o O I K 

in  technology,  it  has  been  possible  to  reduce  the  address  cycle  time,  a^ 

such  that  a^  < c^.  It  was  shown  that  the  probability  of  acceptance  for 

a given  T,  increases  with  decrease  in  a for  small  values  of  1.  In  some 

o 

semiconductor  LSI  memories,  for  example,  in  the  MOS  technology,  a^  has 
been  significantly  reduced  without  significant  increase  in  the  cost  per 
bit  of  the  mem.ory  chip.  The  memory  cycle,  c^,  has  been  decreasing  with 
technological  advancement.  However,  the  cost  per  bit  of  the  memory  chip 
does  not  necessarily  increase  with  decreases  in  c^.  Hence  it  may  be 
assumed  that  for  a certain  range  of  c^,  the  cost  per  bit  is  virtually 
constant.  However,  in  general,  the  cost  per  bit  of  a memory  chip  is  re- 
lated to  the  cycle  time. 

Three  case  studies  are  discussed  to  illustrate  some  design  trade- 
offs. One,  achieves  at  least  a bandwidth,  B^,  for  the  nonbuffered 
scheme.  Tv/o,  achieves  at  least  a probability  of  acceptance,  P^,  for  good 
turn-around  time.  The  third  case  study  which  has  two  examples,  illus- 
trates the  use  of  buffering.  In  all  cases,  the  different  memory  config- 


I 

I 

I 


urations  which  meet  the  design  constraints  are  investigated. 


172 


In  the  first  exarple,  the  main  design  goal  is  to  achieve  at 
least  a bandwidth  or  better  for  a para  1 1 e 1 -p i pel i ned  processor  of 
order  (s,  p)  having  access  to  an  {%,  n)  memory  configuration  with  abso- 
lute module  characteristics,  (a^,  c^) . For  this  case  study,  assume  that 
the  following  parameters  are  fixed. 

1.  Processor  order  (s,  p)  = (s , I4),  s > c. 

2.  Segment  tine  unit,  T seconds. 

3.  Module  characteristics  (a,  c)  = (1,  h) . 

Suppose  It  is  required  to  obtain  a bandv/idth  B = 15.00  requests  in 

one  menory  cycle.  Hence,  ^ 15/16  ^ 0.9^.  Assure  also  that  a nonbuf- 
fered  scheme  is  preferred,  then  theorem  3.5-2  can  be  applied  since  a = 1. 
Below  are  lists  of  possible  memory  configurations  and  their  correspond- 
ing performances . This  table  is  constructed  by  choosing  N and  finding 
a minimum  that  obtains  a P,  > 0.9^. 


Total  nun.ber 
of  modules 
N 

Memory 

conf igurat ion 
(S,,  m) 

Prcbab i 1 i ty  of 
acceptance 
P^(a,  c , p)  = (1  , 

256 

(128,  2) 

0.95 

512 

(6A,  8) 

0.96  1 

IO2A 

(32,  32) 

O.9A 

CO 

L . . 

(32,  «) 

0.95  i 

Notice  that  all  the  above  memory  configurations  produce  a probabil- 
ity of  acceptance  of  at  least  0.9A  as  required.  Also  notice  that  in- 
creasing the  total  number  of  memory  modules,  N,  beyond  102l(  cannot  re- 
duce the  number  of  lines,  I,  below  32  while  maintaining  a P^  of  at  least 
0.9^-  In  order  to  arrive  at  a cost-effective  design,  the  cost  factors 


173  - 


of  the  various  design  options  trading  off  N against  2,  have  to  be  taken 
into  consideration.  If  Z is  very  critical  and  needs  to  be  reduced  be- 
low 32,  the  buffering  scheme  will  have  to  be  used,  since  a is  1 and 
cannot  be  reduced  further. 

As  a second  case  study,  suppose  is  required  to  be  at  least  0.9^+ 
in  order  to  obtain  a good  turn-around  time.  Assume  that  the  following 
parameters  are  given. 

1.  Processor  order  (s,  p) , such  that  S‘p  = 28. 

2.  N = 102A. 

3.  (a,  c)  = (i , c) , such  that  1 ^ c < s. 

k.  The  segment  time  unit,  T,  is  fixed. 

Suppose  again  that  the  nonbuffered  system  is  preferred.  Below  is  a list 
of  the  possible  processor-memory  configurations  that  achieve  the  requi red 
P^  so  that  the  number  of  distinct  instruction  streams,  sp  = 28.  This 
table  is  constructed  by  choosing  all  possible  s and  p combinations  such 
that  S’p  = 28  and  s > c.  Assuming  that  c = s - 3«  find  the  minimum  ^ 
such  that  P^  ^ O.9A.  Then,  m = N/£. 


Processor 
Order 
(s,  p) 

Memory 
cycl  e 
c 

Memory 

conf i gurat ion 
{Z,  m) 

Bandwidth  ' 

No.  accepted  requests/STU 

(^.  7) 

1 

(6A,  16) 

7 X 0.9^ 

(7,  A) 

U 

(32,  32) 

k X 0.9k 

d'*,  2) 

11 

(16,  6k) 

2 X 0.9^  j 

(28,  1) 

25 

(1.  1021*) 

1 X 0.9^  i 

If  the  smallest  Z is  desirable,  the  processor  order  is  (s,  p)  * (23,  1) 
and  the  memory  configuration  is  (Z,  m)  “ (1,  102A).  For  this  combina- 
tion, c = 25.  Hence  if  T 50  ns,  then  the  absolute  memrary  cycle 


- 17^  - 


c = CT  = 1250  ns.  All  the  above  options  satisfy  the  P.  requirement  and 
again  the  cost-effectiveness  of  each  combination  should  be  considered.  It 
is  interesting  to  see  how  the  bandv/ldth  varies  with  the  processor-merrory 
configuration.  As  the  bandwidth  column  indicates,  the  best  performance  is 
obtained  by  the  processor  order  (s,  p)  = (4,  7)  having  access  to  the  (64, 

16)  memory  configuration.  Notice  that  in  this  case  study,  the  bandwidth 
is  proportional  to  the  number  of  processors,  p.  Hence  there  is  a freedom 
of  choice  of  bandwidth  since  designs  with  a variety  of  B are  attainable. 

The  third  case  study  illustrates  the  use  of  buffering  to  reduce  the 
number  of  lines  for  various  systems.  Two  examples  are  given.  For  the 
first  example,  assume  that  the  objective  Is  a minimum  number  of  lines,  £, 
such  that  the  bandwidth,  S 2;  1^.00  requests  per  memory  cycle.  Assume  also 
that  the  follcv/ing  parameters  are  fixed. 

1.  Processor  -^rder,  (s,  p)  = (5,  4). 

2.  Two  effective  module  characteristics  are  obtainable  namely 
(a^,  c^)  = (1,  4.67)  and  (a^,  c^)  = (2,  4-67). 

3.  The  segment  time  unit,  T,  is  fixed. 

Memory  configurations  with  two  possible  memory  sizes  are  investigated, 
namely,  N = 64  and  N = 1024.  Figures  5.8.1  and  5-8.2  shov^,  for  the  CRP  and 
NRP  systems,  the  bai.dwidth  versus  memory  configurations  for  (a^,  c^)  = (1, 
4*67)  and  (a^,  c^)  = (2,  4*67)  respectively  for  N = 64.  Similarly,  figures 
f 5.8.3  and  5.8.4  illustrate,  for  the  BRP  and  NRP  systems,  the  bandwidth  ver- 

sus memory  configurations  for  (a^,  c^)  = (l , 4-67)  and  (a^,  c^)  = (2,  4*67) 
respectively  fcr  M = 1024.  All  the  above  figures  were  derived  from  the  simu 
lation  results.  From  these  curves,  the  following  options  which  satisfy  the 
bandwidth  requirement  are  obtained  as  marked  on  the  curves. 


175  - 


Fi  ,i-e  5.8.1 


Bandwidth  Versus  Memory  Configuration  for  N = 6^  and 
(a^,  c^)  = (i , 4.67) 


176  - 


Figure  5.8.2  Bandwidth  Versus  Memory  Conf igurat ion  for  N = 64  and 
(ae* 


177  - 


Figure  5.8.3  Bandwidth  Versus  Memory  Configuration  for  N = 102^4 
and  (a^,  c^)  = (1,  4.67) 


173  - 


Figure  5.8.4  Bandwidth  Versus  Metrory  Configuration  for  N = 1024 

and  (a  , c ) = (2,  4.67) 
e e 


179  - 


N = 6h 
N = I02it 

In  this  example,  the  memory  configuration  which  gives  the  least  ^ 
is  the  BRP  system  with  (£,  m)  = {h,  16)  for  (a^,  c^)  = (1,  ^*67).  The 
memory  configuration  (£,  m)  = {h,  256)  also  gives  the  least  £.  However, 
in  this  case,  N = 1024  does  not  reduce  £ and  is  thus  less  cost-effective 
than  the  (4,  16)  configuration.  If  modules  with  module  characteristics 
(a  , c.)  = (2,  4-67)  are  cheaper  than  (a  , c„)  = (1,  4‘67),  then  one  might 
consider  trading  the  memory  configuration  (£,  m)  = (4,  16),  using  (I, 

4*67)  module  characteristics,  for  (£,  m)  = (16,  4),  using  (2,  4-67)  module 
characteristics.  In  the  absence  of  current  technological  costs,  the 
solution  is  not  obvious.  Hov/ever,  the  model  provides  many  choices. 

In  the  second  example  of  this  case  study,  let  the  bandwidth  be  at 
least  18.00  or  more.  Assume  that  the  parameters  are  fixed  as  in  the 
previous  example.  The  memory  size  with  N = 64  is  easily  ruled  out  as  a 
possibility,  since  there  exists  no  configuration  with  N = 64  that  achieves 
a bandwidth  of  at  least  18.0.  The  options  for  N = 1024  are  shov/n  below. 
From  figures  5.8.3  and  5.8.4,  various  options  can  be  obtained  as  listed 
below. 


a = 1 

a = 2 

NRP  BRP 

MRP  BRP 

£ = 32  £ = 4 

£ = 8 £ = 4 

^ = 64  £ = 16 

£ = 32  £ = 16 

N *=  1024 


a = 1 

a = 2 

NRP  BRP 

NRP  BRP 

£ - 64  £ = 32 

£ = 256  £ = 128 

In  this  example,  the  memory  configuration  with  the  least  number  of  lines 
is  the  BRP  system  with  (£,  m)  = (32,  32)  and  (a,  c)  “ (1 , 4*67).  Notice 


180  - 


that  for  (a,  c)  = (2,  ^-67)  buffer!  ng  also  reduces  the  number  of 
1 i nes . 

Hence  buffering  can  indeed  be  used  to  reduce  the  number  of  lines 
in  some  cases  v/hile  rraintaining  the  bandwidth.  However,  in  addition  to 
buffering  costs,  the  buffering  system  requires  tire  for  update  and 
maintenance  of  the  queues  v/hich  is  a source  of  overhead  Incurred  by  the 
system.  The  buffering  system.,  with  its  benefits  of  reducing  I while 
keeping  performance  constant  or  increasing  the  performance  for  2 in  the 
vicinity  of  ap,  should  be  weighed  against  the  overhead  incurred  in  the 
BRP  system  in  order  to  arrive  at  a cost-effective  design. 

5.9  Burst  Mode  Operation 

So  far,  we  have  investigated  the  effects  of  various  parameters  on 
the  cerformance  for  paral  lel -pi  pel  ined  processor  of  order  (s,  p)  , v/here  p 
simultaneous  requests  are  issued  to  the  memory  system  every  segment  time 
unit.  We  have  assumed  that  these  p parallel  requests  are  dispatched 
simultaneously  to  the  arbitration  logic  to  determine  which  of  them  can 
be  accepted  for  service  in  the  memory  system.  The  mode  of  operation  in 
which  p parallel  requests  are  dispatched  simultaneously  every  STL'  to  the 
accept/reject  logic  for  processing  is  called  multiplex  rede . 

We  have  shown  that  buffering  each  of  the  p parallel  requests  in  its 
associated  processor's  buffer  ray  improve  the  performance.  In  this 
scheme,  the  multiplex  mode  of  operation  is  also  in  effect,  since  we  have 
assum.ed  that  at  most  one  ricmory  request  from  each  buffer  is  obtained  for 
service  in  the  momory  system  every  STU.  Hence  at  most  p requests  are 


181 


accepted  s inul taneous 1 y . In  this  section,  we  will  describe  a differ- 
ent buffering  scheme  which  necessitates  a new  mode  of  operation.  We 
will  also  investigate  the  effect  of  such  schemes  on  the  performance. 

In  one  memory  cycle,  a total  of  cp  memory  requests  are  issued. 

These  cp  memory  requests  are  buffered  in  a buffer  of  length  cp  where- 
upon the  cp  memory  requests  are  dispatched  at  the  end  of  the  memory  cy- 
cle to  the  accept/reject  logic  in  order  to  determine  which  of  the  cp  re- 
quests can  be  accepted  for  service  in  the  memory.  The  mode  of  operation 
in  which  the  total  number  of  memory  requests  issued  in  one  memory  cycle 
are  dispatched  simultaneously  to  the  accept/re ject  logic  at  the  end  of 
the  memory  cycle  Is  called  burst  mode.  The  multiplex  and  burst  mode 
operations  are  illustrated  in  figure  5.9.1. 

If  the  segment  time  unit  is  T in  the  multiplex  mode,  then  the  ef- 
fective segment  time  unit  in  the  burst  mode  is  t'  = cT . Notice  that  the 
memory  configuration  and  the  absolute  module  characteristics  are  unal- 
tered by  the  choice  of  the  mode  of  operation.  Hence  every  t'  seconds, 
cp  memory  requests  are  dispatched.  We  will  investigate  the  performance 
of  the  burst  mode  operation  and  compare  it  with  the  multiplex  mode  oper- 
ation. It  will  be  shown  that  for  some  module  character i st i cs , (a,  c) , 
and  memory  configurations,  the  burst  mode  outperforms  the  multiplex  mode 
and  vice  versa. 

Two  extreme  cases  of  module  character i s t ics  will  be  investigated  to 
draw  a comparison  between  the  two  modes  of  operation.  The  probability 
of  acceptance  will  serve  to  indicate  the  performance  of  the  various  module 
characteristics.  For  a = c,  the  multiplex  mode  gives  the  probability  of 
acceptance,  P^(a,  c,  p)  from  theorem  3-5.3  as 


182 


I 


Multiplex  Mode  Burst  Mode 


T . . . T 

f t . . . + t * 

P P P P cp 


Figure  5.9.1.  Multiplex  and  burst  node  sequencing. 


I 


AO-A042  646  ILLINOIS  UNIV  AT  URBANA-CHAMPAI6N  COORDINATED  SCIENCE  LAB  F/B  9/2 

MENORY  organizations  and  Their  effectiveness  for  MULTIPROCESSIN— ETC(U) 
may  77  F a BRie«S  DAAB07-72-C-02S9 

R-766  NL 


UNCLASSIFIED 


- 183  - 


P^(c,  c,  p) 


[1  - (1  - |)P]V 

1 + [1-  (1  - 


£ 

]{c-1) 


Since  the  segment  time  unit  for  the  burst  mode  is  t',  which  is  equal 
to  the  absolute  memory  cycle  of  the  nodule,  cT , the  relative  address 
and  memory  cycles  are  both  equal  to  1.  Since  the  number  of  parallel  re- 
quests dispatched  every  T'  seconds  is  cp,  the  probability  of  acceptance 
for  the  burst  mode  is,  from  theorem  3-5.2, 

1.  cp)  - [iMI-i)'”) 


Theorem  5-9-1  For  a = c,  1»  cp)  ^ P^(c,  c,  p) , for  all  memory 

configurations. 

Proof:  We  must  show  that  , _ „ 

1 cn  £ Cl  - (I  - ^ 

n - (1  - t)  ~ > — ^ 


‘'P  " 1 + [1  - (1  - [)P](c  - 1) 

For  Jl  =»  1 , it  is  trivial  to  show  that  P^(1»  U cp)  = *-»  • 

Assume  that  £ > 1 . Let  y = (1  “ ^)  ^ - Si  nee  p ^ 1 , for  £ > 1 , y < 1 

Then  we  must  show  that 


0 - /)  > , . ({'.yl:  ■ 0 

That  is,  we  need  to  show  that 

(I  - y"^)(l  - y + J)  i (1  - y) . 

Multiplying  the  left  hand  side  out,  we  obtain 


-1  - Y- 


1 -y-.^ 


C•^l 


1 ■ y*^  ■ y + ^ ~ c " ' ” y + y*^  (y  ■ i ^ ^ * " y^)  • 

Hence  we  need  to  show  that 

y‘"(y  - 1)  + - y*")  >0,  or.  ^(1  - > (1  - y)y‘', 


- I8i*  - 


or 


» 1 V.  • 


- y) 


However,  | ~ ^ = 1 + y + + . . . + y^  ^ + y^  ^ . 


1 c 

1 - y 


Hence  -'t  ^ L=-  + + 

y (1  - y)  y y y 

S i nee  y < 1 , ^ 1 , for  0 ^ ^ c - 1 . 


. + - + 1 

y 


c-l 


1 


Hence,  I — > c,  and  the  theorem  holds. 
1*0  y 


c-l 

E 

i-0 


I 

y 


□ 


Therefore,  for  module  characteristics  (a,  c) , such  that  a = c,  the 
burst  mode  of  operation  has  as  good  or  better  probability  of  acceptance 
then  the  multiplex  mode  of  operation  for  any  memory  configuration.  Notice 
that  the  bandwidths  expressed  as  the  number  of  memory  requests  accepted 
per  memory  cycle,  for  the  burst  and  multiplex  modes  are  cpPy,(l,  1,  cp) 
and  cpP^(c,  c,  p)  respectively.  Hence  the  bandwidth  of  the  burst  mode 
is  also  as  good  or  better  than  that  of  the  multiplex  mode.  In  addition 
to  the  superior  performance  of  the  burst  mode,  it  eliminates  the  need 
for  the  functional  unit  which  is  otherwise  required  to  test  for  busy  line 
collisions.  However,  the  cp  by  I crossbar  sv/itch  required  for  the  burst 
mode  is  more  complex  than  the  p by  A crossbar  switch  for  the  multiplex 
mode.  This  comiplexity  may  significantly  increase  the  cost  of  Implementing 
the  burst  node  for  c > 1.  Notice  that  for  c ■ I,  the  burst  and  multi- 
plex modes  are  equivalent.  The  burst  mode  may  also  require  a large  num- 
ber of  tests  to  detect  multiple  access  line  collisions,  since  cp  may  be 
large.  Furthermore,  the  bus  widths  and  buffering  costs  are  higher  for 


the  burst  mode. 


- 185  - 


Notice  however,  that  for  the  burst  mode,  the  expected  number  of 
busy  lines  at  the  end  of  a memory  cycie  is  zero,  since  c ■ 1 for  the 
burst  mode.  Hence  ail  the  i lines  are  always  avaiiabie  for  service 
when  the  cp  parallel  requests  are  dispatched  simultaneously. 

The  other  extreme  case  to  be  considered  is  the  memory  system  with 
module  characteristics  (a,  c)  = (1,  c) . 


Theorem  5.9.2  For  a - 1 , . 1 . cp)  ^P^(l,c,  p),  ifJl»N.  □ 


The  proof  foilows 
for 

P^(l,  c,  p)  « 


immediately  by  substituting  £ for  N 


[1  - (1  - |)P]  ^ 

I + [I  - (1  - [)P]  ^ 


in  the  expression 


and  applying  theorem  5.9.1. 

Thus  for  the  case  a * 1 , £ = N,  the  burst  mode  of  operation  is  as 
good  or  better  than  the  multiplex  mode.  For  £ < N,  a proof  has  not  been 
obtained.  However,  the  following  hypothesis  seems  to  hold. 


Hypothesis  5.9.1  For  a ■ 1,  P^(l»  1 cp)  ^P^  (1,  c,  p),  if  £<  N.  □ 


This  hypothesis  has  been  shown  to  be  true  for  a wide  variety  of  memory 
configurations  (£,  m) , and  module  characteristics  (1,  c) . Assuming  that 
this  hypothesis  is  true  in  general,  the  multiplex  mode  exhibits  as  good 
or  better  performance  than  the  burst  mode  when  a ■ 1 and  £ < N. 

The  above  theorems  and  hypothesis,  suggest  that  for  module  charact- 
eristics (a»  c),  such  that  1 < a < c,  there  may  exist  some  memory 


186  - 


configurations  for  which  P^(a»  c»  p)  > some  for  which 

P^(a,  c,  p)  < 1.  cp) . Examples  of  the  former  may  be  for  small  a 

and  £ < N.  Examples  of  the  latter  may  be  for  large  a or  £ * N. 

Although  in  some  cases  the  burst  mode  may  exhibit  a better  perform- 
ance than  the  multiplex  mode,  its  cost-effectiveness  is  questionable. 

The  burst  mode  operation  requires  that  all  incoming  memory  requests  within 
a memory  cycle  be  buffered  and  dispatched  simultaneously  at  the  end  of 
the  memory  cycle  to  the  memory  system.  This  mode  of  operation  may  affect 
the  synchronous  nature  of  operation  in  the  pipelined  processor  and  operands 
may  have  to  be  buffered  and  a scheduling  scheme  introduced  to  maintain 
correct  sequencing  of  the  processes. 


I 


- 187  - 


6.  CONCLUSIONS 


i.I  Summary  of  Results 


The  object  of  this  research  Is  to  develop  a flexible  semiconductor 
memory  organization  for  parallel-pipelined  processors  and  Investigate 
the  effect  of  memory  Interference  on  the  system  performance  for  a vari- 
ety of  module  characteristics,  processor  orders  and  memory  configura- 
tions.!^ We  exploited  the  characteristics  of  semiconductor  memories  to 
develop general  yet  flexible  memory  organization  for  multiprocessor 
systems.  We  have  fully  Investigated  the  effect  of  the  memory  Interfer- 
ence on  the  system  performance  for  two  classes  of  module  characteristics, 
namely,  (a,  c)  such  that  a ■ I and  c ^ I,  and  (a,  c)  such  that  a > 1 
and  a <_  c ^ 2a.  We  discovered  that  the  complexity  of  the  Markov  analysis 


grows  nonlinearly 


wi  th  ^ 
_ a _ 


for  a > 1.  Hence  we  do  not  have  a general 


solution  of  performance  for  arbitrary  (a,  c). 

In  chapter  two,  the  memory  organization  was  described.  It  was  seen 
that  the  memory  Interference  problem  Is  more  complicated  than  models  de- 
veloped by  previous  Investigators.  However,  the  model  lends  itself  to 
Markov  analysis. 

In  chapter  three,  we  developed  the  analytical  model  and  Introduced 
the  probability  of  acceptance  of  a request  to  evaluate  the  effect  of 
memory  Interference  on  the  system  performance.  Since  a general  expres- 
sion for  P^(a,  c,  p)  Is  not  known  for  any  module  characteristics,  (a,  c) , 
we  have  obtained  the  lower  and  upper  bounds  on  c,  p) . We  observed 

that  the  performance  of  the  analytic  model  Is  very  weak  for  memory 


- 188  - 


configurations  (H,  m) , such  that  A is  In  the  vicinity  of  ap.  Further- 
more, it  was  found  that  for  any  memory  configuration,  the  nxDdule  charac- 
teristics with  a = 1 give  the  best  performance. 

In  chapter  four,  we  investigated  simulation  models  for  the  buffered  ^ 

and  nonbuffered  request  processor  system.  In  the  nonbuffered  request 
processor  system,  rejected  requests  were  resubmitted  one  instruction 
cycle  later  with  the  same  address.  We  demonstrated  that  the  results  of  ^ 

the  experimental  model  were  not  significantly  different  from  the  results 
of  the  analytical  model.  This  justifies  the  assumption  that  the  discard- 
ing of  rejected  requests,  for  analytical  purposes,  does  not  necessitate 
a significant  deviation  of  the  analytical  model  from  reality.  The  buffered 
scheme  was  shown  to  be  as  good  or  better  than  the  nonbuffered  scheme  for 
any  memory  configuration  and  module  characteristics. 

In  chapter  five,  the  effects  of  the  various  parameters  on  performance 
were  Investigated.  We  found  that  for  very  large  number  of  memory  modules, 

N,  the  effect  of  the  memory  cycle,  c,  is  insignificant. 

There  is  generally  less  payoff  to  increasing  N for  large  H and  overall 
p,  and  for  small  i and  large  p.  However,  there  Is  significant  payoff  to 
increasing  N for  small  I and  small  p,  and  for  large  Z and  large  p. 

Memory  configurations  where  8,  < p give  poor  performance.  The  per- 
formance deteriorates  as  the  address  cycle,  a.  Is  increased  in  this 
region.  For  £ ■ p,  there  is  a point  of  Inflection.  For  a > 1,  there  is 
a significant  payoff  to  increasing  8,  when  8 lies  between  p and  ap. 

The  effect  of  module  characteristics  on  performance  is  minimal  when  8 
and  N are  sufficiently  large.  We  have  shown  that  for  small  8,  the  effect  | 

of  the  address  cycle,  a,  can  be  drastic.  In  general,  a Is  usually  the 

1 


- 189  - 


critical  factor  when  Z Is  small  and  N is  large.  Hence  when  possible, 
a module  characteristic  with  a ■ 1 should  be  chosen.  Furthermore,  we 
showed  that  for  H * N,  the  performance  is  Independent  of  the  address 
cycle,  a. 

The  processor  speed  Is  another  critical  factor  that  determines  the 
bandwidth  of  the  system.  We  have  shown  some  illustrations  of  the  effect 
of  the  processor  speed  on  the  performance. 

Buffering  has  Its  maximum  effect  on  performance  when  the  memory 
configuration  Is  such  that  £ ■ ap.  However,  for  £ < p,  and  for  large 
£ and  N,  buffering  tends  to  have  very  little  effect  on  performance. 

Hence  buffering  can  be  effectively  used  for  two  purposes.  One,  to 
increase  the  performance  for  £ In  the  vicinity  of  ap.  Two,  if  £ > ap, 
buffering  can  be  used  to  reduce  the  number  of  lines,  £,  while  main- 
taining the  bandwidth. 

We  have  shown,  by  some  design  tradeoff  examples,  that  there  exists 
a wide  variety  of  design  options  open  to  the  designer.  However,  the 
designer  should  make  Judicious  choices  at  each  design  step  to 
optimize  the  cost-effectiveness  of  the  design. 

6.2  Suggestions  for  Further  Research 

No  attempt  has  been  made  to  develop  an  analytical  model  for  the  buf- 
fered request  processor  system.  Although  we  have  shown  that  the  complexity 
of  the  analysis  for  the  nonbuffered  scheme  can  become  too  difficult.  It 
may  be  possible  to  develop  some  approximate  queueing  models  for  the 


buffered  scheme. 


- 190  - 


A possible  extension  of  this  thesis  may  be  directed  dowards  aeveloping 
a model  for  dynamic  memories  and  investigating  their  performance  in  multi- 
processor systems.  It  may  be  possible  to  characterize  the  dynamic  mem- 
ory module  as  a function  of  the  access  time  distributions  which  depend 
on  the  module  size,  cell  Interconnection  patterns  and  allowable  memory 
transformations. 

Software  development  for  multiprocessor  systems  that  utilize  the 
memory  organization  discussed  in  this  thesis  should  be  investigated. 

Memory  allocation  for  different  processes  may  also  affect  the  performance 
significantly.  This  problem  may  be  interesting.  In  practice  job 
sequencing  in  such  computer  systems  may  be  complex. 

Possible  pipelining  of  the  interconnection  network  between  processor 
and  memory  could  significantly  increase  the  efficiency  of  detecting 
memory  collisions  and  routing  accepted  requests  to  their  respective 
addressed  modules  and  should  be  studied. 

Finally  the  interrelationship  of  the  results  presented  here  and  the 
design  and  management  of  a memory  hierarchy  is  a most  important  probiem. 


- 191  - 


APPENDIX  A 

In  order  to  calculate  the  probafafllty  of  acceptance,  c,  p)  for 

the  system  with  module  characteristics  (a,  c)  = (2,  k)  and  processor  order, 
p “ 1 , from  the  system  state  graph,  we  will  reproduce  the  reduced  system 
state  graph  G^(a,  c) , for  module  characteristics  (a,  c)  ■ (2,  k) . G^(2,  k) 
is  shown  in  figure  A. 

Let  Py^,  Pg,  Pj.,  Pp,  P^,  Pp,  Pq  and  Pj  denote  the  steady  state  probabil- 
ity of  being  in  system  states  [0],  [(1)],  [(2)],  [(1)(2)],  [(3)1,  [(l)(3)l, 

[(2) (3)]  and  [(l)(2)(3)]  respectively.  We  can  therefore  proceed  to  write 
the  steady  state  equations  for  G^(2,  k)  as  follows,  noting  that  N » Urn. 

'• 

% - '“a  * ^ '■e  * "’c  * '’g> '• 

fc  ■ I '“s  * "f 5- 



^E  ■ TT '■c  * ? ''g 5- 

nr  

D _ !!!^±  D a.  O 7 

nr^\ '• 

^ 


8. 


192 


Figure  A.  Reduced  System  State  Graph,  G^(2,  4) 


! 


193 


P (2,  4,  1)  is  the  probability  of  being  in  any  of  the  system  accept- 

M 

ance  states  in  G^(2,  k) . The  system  acceptance  states  in  G^(2,  k)  are 
[(i)].  [(1)(2)],  [(1)(3)]  and  [(1)(2)(3)].  Hence. 


P/2,  k,  ,)  - Pj  * P„  * Pp  ♦ P, 


Moreover,  since  a module  would  have  to  be  in  one  of  the  systems  states, 


Pj  + Pj  ♦ p,.  + p„  * Pj  * Pp  + Pg  * p. 

Substituting  for  in  equation  2, 


% - "e  ^ "’c  * "g' 


Substituting  for  P^  from  equation  5 in  Pg  expression, 

’’b  ■ i ’’c  * I '“c  * 't-  <’’c  * '“g' 

Simplifying, 


p . £ p + I2!±  p 

N C N G 


Substituting  for  P_  and  P_  in  equation  3. 

D r 

p - m f £1  p 4.  ElL  p ^ 4.  intL  r nih  p 4.  p ) 

C N N N ' N N G' 

Simplifying, 


p . nm  P . 

C N-m  G 


Substituting  the  P_  expression  from  equation  13  in  12, 


,m  . nr»-l  ir>»l  . N-m>  _ 

N-m  N N-m^  G 


Therefore, 


p - nrtL  P 

N-m  G 


Notice  that,  Pg  • Pj.. 


Substituting  the  P^  expression  from  equation  13  in  5i 
p ■ (i.  * ISll.  + — * p 

'N  N-m  N N-m^  G * 


194 


Therefore, 

^ 2H-n>.+  ] 

E N(N-m)  G 

Substituting  the  expression  from  I5  in  1, 

P =2±lIIi±L  p 
^ N^(N-m) 

Substituting  the  P^  expression  from  I3  in  6, 

P « flizn  • . N-ni- 1 . N-m\  » 

^ N N-m  N N-m'  ’ 

S inp 1 i fy i nc , 


from  equation  8, 


P 


I 


N-2m 
2rrH-l  D 


Substituting  P„,  P_ 

D r 

p = N'm  • rrt-i 
D ' N N-m 


and  P|  expressions  from  14,  I7  and  18  into  4 


+ nil  (1  + HzlH)  p + N-m-1  p 
N ^ 2m+r  N G 


f 


Simp  I ifying, 

P - P + i-N+l)  (m-1) 
D "^G  ^ N(2rtH-l) 


Simplifying  further, 

p = Nl2m-H ) 

D Nm+2N-m+l  ^G 


Hence  from  equation  18, 

p . N(N-2m) 

I Nnrt-2N-m+l  G 


15. 


16. 


17. 


18. 


19. 


20. 


Substituting  P^,  Pg,  P^,  P^,  P^,  P^,  and  Pj  expressions  from  16,  14,  I3, 
19.  15,  17.  and  20  respectively  In  10, 

2N-tTH-l  ^ 2(m-fl)  N(2m+1)  2N-nH-l  , N(N-2m)  1 

N^(N-ni)  Nm+2N-rH-I  N (N-m)  NnH-2N-m+l  * P^ 


I 


- 195 


\ 

I 


Expanding  and  simplifying, 

_ N^(N-m)  (Nm-2N-[n+l) 


^ (N^+Nm+2N-m+l) (N^+2N-m+l) (N+1) 


21  . 


Substituting  P_,  P„,  P_  and  P,  expressions  from  14,  19,  17,  and  20  re- 
D 1)  F I 

spectively  in  9, 

P (7  L iWlEllp  , N(2nH-l)  , N(N-2m) 

' N-m  G NnH-2N-m+l  *^0  0 Nm+2N-m+l  G 

Simpl ifying. 


P (7  h .N  (N%2N-m+l)(N+l)  „ 
A'  ’ ’ ' (N-mHNm-2N-m+1)  ^G 


22. 


Substituting  P_  expression  from  21  in  22, 

U 

n ,N  (N^+2N-m+l)  (H+1)  . N^(N-m)  (Nm-2N-m+l) 

(N-m)  (Nm-2N-m+l)  (rr+NnH-2N-m+l ) (NS2N-m+l ) (N+1 ) 

Canceiiing  common  terms  in  numerator  and  denominator, 

.2 


Pa(2,  1)  - , 


N 


23. 


N +Nm4-2N-m+l 


It  is  obvious  that  deriving  an  expression  for  Py^(2,  1)  in  terms  of 

N and  m only  is  very  tedious.  This  technique  cannot  be  applied  easily  to 
a system  state  graph  with  a large  number  of  states.  Some  computer  ass i s- 
tance  will  probably  be  required  as  the  sparse  matrix  gets  larger  for 
a > 1 and  ^ 2. 


J 


- 136  - 


LIST  OF  REFERENCES 

1.  Flynn,  M.J.,  "Very  High  Speed  Computing  Systems,"  Proc.  IEEE, 

Vol . 5^,  No.  12,  pp.  1901-1909,  December  1966. 

2.  Barnes,  G.H.,  "The  ILLIAC  IV  Computer,"  IEEE  Trans.  Comput., 
pp.  7^6-757.  August  1968. 

3.  Anderson,  D.W.,  et.  al.,  "The  IBM  System/360  Model  91:  Machine 
Philosophy  and  Instruction-Handling,"  IBM  J.  of  Res,  and  Dev., 
pp.  8-24,  January  1967. 

4.  Hintz,  R.G.,  and  Tate,  D.P.,  "Control  Data  STAR-100  Processor 
Design,"  Proc.  Compcon  Fall  72,  pp.  1-4,  September  1972. 

5.  Watson,  W.J.,  "The  Tl  ASC  - A Highly  Modular  and  Flexible  Com- 
puter Architecture."  Proc.  FJCC  1972,  pp.  221-228. 

6.  Hodges,  D.A.,  Semiconductor  Memories,  IEEE  Press,  1972. 

7.  Davidson,  E.S.,  "Scheduling  for  Pipelined  Processors,"  Proc.  7th 
Annual  Hawaii  Inti.  Conf.  on  Systems  Sciences,  pp.  5S-6O,  January 

8.  Shar,  L.E.,  and  Davidson,  E.S.,  "A  Multimini-Processor  System 
Implemented  Through  Pipelining,"  Computer,  Vol.  7,  No.  2,  pp. 

42-51,  February  1974. 

9.  Davidson,  E.S.,  et.  al.,  "Effective  Control  for  Pipelined  Com- 
puters ,"  P£oc^;__Com£con_S££j£|^^  pp.  181-184,  February  1975. 

10.  Weller,  D.L.,  and  Davidson,  E.S.,  "Optimal  Searching  Algorithms 
for  Parallel-Pipelined  Computers,"  Spr inger-Ver lag  Lecture  Notes. 
Mo.  24,  pp.  90"93,  August  1975. 

11.  Danielsson,  P-E,  and  Gudmundsson,  B.,  "Time-Shared  Memory-Processor 
Interface,"  1975  Sagamore  Comput.  Conf.  Parallel  Processing,  pp. 
90-98,  August  1975. 

12.  Hellerman,  H.,  Digital  Computer  System  Principles,  New  York: 
McGraw-Hill,  1967,  PP.  223-229. 

13.  Knuth,  D.E.,  and  Rao,  G.S.,  "Activity  in  Interleaved  Memory," 

IEEE  Trans.  Comput.,  Vol.  C-24,  No.  9,  pp.  943~944,  September 

1975. 

14.  Burnett,  G.J.,  and  Coffman,  E.G.,  "A  Study  of  Interleaved  Memory 
Systems,"  1970  Spring  Joint  Comput.  Conf .,  AFI PS  Conf . Proc., 

Vol.  36 , Montvale,  N.J.:  AFIPS  Press,  pp.  467-^74,  1970. 


197  - 


15.  Burnett,  G.J.  and  Coffman,  E.G.,  "Analysis  of  Interleaved  Memory 
Systems  Using  Blockage  Buffers,"  Commun . ACM,  Vol.  18,  No.  2, 
pp.  91-95,  February  1975. 

16.  Skinner,  C.,  and  Asher,  J.,  "Effect  of  Storage  Contention  on 
System  Performance,"  IBM  Syst.  J.,  Vol.  8,  No.  4,  pp.  319“333, 

1969. 

17.  Strecker,  W.D.,  "Analysis  of  the  Instruction  Execution  Rate 

In  Certain  Computer  Structures,"  Ph.D.  dissertation,  Carnegie- 
Mellon  Univ.,  Pittsburgh,  Pa.,  1970. 

18.  Ravi,  C.V.,  "On  the  Bandwidth  and  Interference  In  Multiprocessors," 

IEEE  Trans.  Comput.,  Vol.  C-21  , pp.  899~901,  August  1972. 

19.  Bhandarkar,  D.P.,  "Analysis  of  Memory  Interference  in  Multiprocessors," 
IEEE  Trans.  Comput.,  Vol.  C-2k,  pp.  897“908,  September  1975. 

20.  Sastry,  K.V.,  and  Kain,  R.Y.,  "On  the  Performance  of  Certain  Multi- 
processor Computer  Organizations,"  IEEE  Trans.  Comput.,  Vol.  C-2k, 
pp.  1066-107^,  November  1975. 

21.  Baskett,  F.,  and  Smith,  A.,  "Interference  in  Multiprocessor  Com- 
puter Systems  with  Interleaved  Memory,"  Commun.  ACM,  Vol.  19,  No.  6, 
pp.  327-33A,  June  1976. 

22.  Denning,  P.J.,  "The  Working  Set  Model  for  Program  Behaviour," 

Commun.  Ass.  Comput.  Mach.,  Vol.  11,  pp.  323“333.  May  1968. 

23.  Kleinrock,  L.,  Queueing  Systems,  Vol.  1:  Theory,  Wiley  - Interscience, 

1975. 

24.  Briggs,  F.A.,  and  Davidson,  E.S.,  "Organization  of  Semiconductor 
Memories  for  Para  1 lei -Pi pel ined  Processors,"  IEEE  Trans.  Comput., 

Vol.  C-26,  pp.  162-169,  February  1977.  — — — — — 

25.  Chang,  D.,  et.  al.,  "On  the  Effective  Bandwidth  of  Parallel  Memories," 
Dept,  of  Computer  Science,  Univ.  of  Illinois,  Urbana,  111.,  September 
1975. 


